From: "Steven A. DuChene" > It is filled with a lot of un-needed style and formating tags > as well as all kinds of stupid extra characters due to some MS > "standard" character formatting stuff. Things like braking lines > in the middle of words and then adding an equal sign at the end > of the broken line or replacing equal signs in the html code with > "=3D' That's not HTML. That's quoted-printable encoding. The mail client should've automatically converted that to UTF-8 or whatever when it saved the file. If you have MIME::QuotedPrint installed, you can decode that with a Perl one-liner and see if it looks any better. > Does anyone know of a tool that will clean this crappy excuse for > html code up into something more standard? "Demoroniser" is probably not what you want. I've seen a few things like that over the years, and have gotten rid of most of the junk with a bunch of regular expressions. Without a look at what the mangled HTML looks like, I couldn't give you a list of sed commands to feed this data through. -- Matt G / Dances With Crows The Crow202 Blog: http://crow202.org/wordpress/ There is no Darkness in Eternity/But only Light too dim for us to see --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss