Re: Document Management

Top Page
Attachments:
Message as email
+ (text/plain)
+ PGP.sig (application/pgp-signature)
+ (text/plain)
Delete this message
Reply to this message
Author: Alex Dean
Date:  
To: Main PLUG discussion list
Subject: Re: Document Management

On Nov 2, 2009, at 11:32 AM, Craig White wrote:

> On Mon, 2009-11-02 at 08:31 -0700, Matt Graham wrote:
>> I spent 3 or 4 years doing stuff like this on the NYT, Wall Street
>> Journal, Christian Science Monitor, and Boston Globe. You will NOT
>> be able to get decent OCR with free software. Newspapers require
>> a different approach than most OCR packages take; you have to split
>> each article up into multiple individual image files and OCR each
>> file separately, then stitch the results back together. And editing
>> the results is totally necessary since newspaper text is so horrible
>> in quality.
> ----
> I don't know anything about GOCR at all.
>
> A few years ago I set up tesseract and it worked as well as I have
> seen
> any OCR program work (in terms of accuracy) though clearly there are
> many limitations compared to something like Omnipage. In the end it
> was
> rather easy to install and get it working.
>
> http://code.google.com/p/tesseract-ocr/


Google uses tesseract in their ocropus project. Ocropus seems
promising, but is still at a fairly early stage.
http://code.google.com/p/ocropus/

alex
---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss