On Nov 2, 2009, at 11:32 AM, Craig White wrote:
> On Mon, 2009-11-02 at 08:31 -0700, Matt Graham wrote:
>> I spent 3 or 4 years doing stuff like this on the NYT, Wall Street
>> Journal, Christian Science Monitor, and Boston Globe. You will NOT
>> be able to get decent OCR with free software. Newspapers require
>> a different approach than most OCR packages take; you have to split
>> each article up into multiple individual image files and OCR each
>> file separately, then stitch the results back together. And editing
>> the results is totally necessary since newspaper text is so horrible
>> in quality.
> ----
> I don't know anything about GOCR at all.
>
> A few years ago I set up tesseract and it worked as well as I have
> seen
> any OCR program work (in terms of accuracy) though clearly there are
> many limitations compared to something like Omnipage. In the end it
> was
> rather easy to install and get it working.
>
> http://code.google.com/p/tesseract-ocr/
Google uses tesseract in their ocropus project. Ocropus seems
promising, but is still at a fairly early stage.
http://code.google.com/p/ocropus/
alex
---------------------------------------------------
PLUG-discuss mailing list -
PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss