If you just want to clean up ugly HTML, I recommend using HTML Tidy, which can be quite aggressive, and can be used in a pipeline...
Craig White wrote:
> I have a lot of documents to convert from Microsoft 'doc' format to html
>
> I have found a number of tools to do that but it doesn't really clean
> them up very well which means that I will have to do a lot of hand edits
> which isn't so bad considering that I will undoubtedly have to do this
> anyway to get a common css and common headers/footers, etc.
>
> If anyone has suggestions on best methods for the above, I would
> appreciate it but thus far, I see little better than openoffice macros
> which do this in bulk (conversions).
>
> More importantly though, there is a structure to the storage...
>
> Base (subdirectory)
> Section 1 (subdirectory)
> Section 1A (document)
> Section 1B (document)
> Section 2 (subdirectory)
> Section 2A (document)
>
> etc. and I would love for some methodology to build a table of
> contents/links to these documents automatically - and possibly even
> output the end result (the whole enchilada) perhaps in PDF so that I
> have other means to distribute this. I have seen many different
> publications that get built this way and I don't know how they
> accomplish this. Are there some open source tools that can do this?
>
> Craig
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change you mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss