It has a lot of magic indeed. But as he comments says
about one line in his script:
# Impossible to do properly with a regex, I make do by allowing at most one
level of nesting.
The fact that the regular expression is so complex and doesn't
even fully match all email addresses lends to the argument that
email addresses can't really be checked for syntax unless you
actually send to and recieve email from it.
I remember Tom Christensen wrote a script that tried its
best to determine wheter an email address was valid or
not and it came out to 800+ lines using SMTP queries and
everything. Even after all the work on it, he admitted
that it couldn't be relied upon because of the inherent
unmatchability of email syntax. (SMTP servers _can_
and _do_ lie about what emails they accept for security's
sake).
Instead of relying on such a monsterous script; it would be
better to evaluate the structure of the file you're parsing
for emails and program your assumptions about where
an email might exist on each line. It's like parsing based on
what you know about the other parts of the line rather than
looking for the email by it's syntax.
Eden Li
eden.li@asu.edu
From: "David A. Sinck" <
sinck@ugive.com>
> I'm going to have to ask you to look at
>
> http://public.yahoo.com/~jfriedl/regex/email-opt.txt
>
> then. That'll tell you if an email address is valid as an address.
>
> It's a bit long, but it has *all* the magic.
>
> It still overlooks someone/thing being on the other end though. You'd
> have to connect to the named smtp server and see if you can get it to
> accept it.