Re: how to sanitize MS Word HTML output?

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Lisa Kachold
Date:  
To: Main PLUG discussion list
Subject: Re: how to sanitize MS Word HTML output?
You can try this Toaster SPAM decoder (it will readily tell you if
it's decodable
as Decode quoted-printable encoded text):

http://www.toastedspam.com/decodeqp

It should be noted that quoted-printable encoded text is generally
associated with EMAIL, not Word?

On Mon, May 4, 2009 at 11:55 AM, Matt Graham <>wrote:

> From: "Steven A. DuChene" <>
> > It is filled with a lot of un-needed style and formating tags
> > as well as all kinds of stupid extra characters due to some MS
> > "standard" character formatting stuff. Things like braking lines
> > in the middle of words and then adding an equal sign at the end
> > of the broken line or replacing equal signs in the html code with
> > "=3D'
>
> That's not HTML. That's quoted-printable encoding. The mail client
> should've automatically converted that to UTF-8 or whatever when
> it saved the file. If you have MIME::QuotedPrint installed, you
> can decode that with a Perl one-liner and see if it looks any better.
>
> > Does anyone know of a tool that will clean this crappy excuse for
> > html code up into something more standard?
>
> "Demoroniser" is probably not what you want. I've seen a few things
> like that over the years, and have gotten rid of most of the junk
> with a bunch of regular expressions. Without a look at what the
> mangled HTML looks like, I couldn't give you a list of sed commands
> to feed this data through.
>
> --
> Matt G / Dances With Crows
> The Crow202 Blog: http://crow202.org/wordpress/
> There is no Darkness in Eternity/But only Light too dim for us to see
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list -
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>




--
www.obnosis.com (503)754-4452
"Contradictions do not exist." A. Rand
---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss