how to sanitize MS Word HTML output?

Lisa Kachold lisakachold at obnosis.com
Mon May 4 12:29:18 MST 2009


You can try this Toaster SPAM decoder (it will readily tell you if
it's decodable
as Decode quoted-printable encoded text):

http://www.toastedspam.com/decodeqp

It should be noted that quoted-printable encoded text is generally
associated with EMAIL, not Word?

On Mon, May 4, 2009 at 11:55 AM, Matt Graham <danceswithcrows at usa.net>wrote:

> From: "Steven A. DuChene" <linux-clusters at mindspring.com>
> > It is filled with a lot of un-needed style and formating tags
> > as well as all kinds of stupid extra characters due to some MS
> > "standard" character formatting stuff. Things like braking lines
> > in the middle of words and then adding an equal sign at the end
> > of the broken line or replacing equal signs in the html code with
> > "=3D'
>
> That's not HTML.  That's quoted-printable encoding.  The mail client
> should've automatically converted that to UTF-8 or whatever when
> it saved the file.  If you have MIME::QuotedPrint installed, you
> can decode that with a Perl one-liner and see if it looks any better.
>
> > Does anyone know of a tool that will clean this crappy excuse for
> > html code up into something more standard?
>
> "Demoroniser" is probably not what you want.  I've seen a few things
> like that over the years, and have gotten rid of most of the junk
> with a bunch of regular expressions.  Without a look at what the
> mangled HTML looks like, I couldn't give you a list of sed commands
> to feed this data through.
>
> --
> Matt G / Dances With Crows
> The Crow202 Blog:  http://crow202.org/wordpress/
> There is no Darkness in Eternity/But only Light too dim for us to see
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>



-- 
www.obnosis.com (503)754-4452
"Contradictions do not exist." A. Rand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20090504/01c1eece/attachment.htm 


More information about the PLUG-discuss mailing list