PDA

View Full Version : Scanner Help Needed


Nawlins
Aug 3, 2003, 04:39 AM
I'm looking for a scanner I can use to scan chapters of books, or entire books, onto my computer to mark them in my word processing software with underlining, bold, italics, etc. Scanning pictures would not be an issue. I've been told OCR is unreliable and often is unable to convert the .jpeg files into .txt files effectively. I'm using a 12" Powerbook with OS X and AppleWorks is my word processing software.

Any ideas?

Alex

Eniregnat
Aug 4, 2003, 04:33 PM
Search this site for advice about scanners and OCR software on this site.

I can in generic, help you your use of OCR and rendering software.

Working backwards, the final output from an OCR program can be simple (a text file with line breaks) or complex (i.e. a formatted file that includes information about columns). Where the file ends up is generally not important, and unless you need a very specific kind of formatting, then just have the output be a TXT or RTF file. (You can open both with AppleWorks.)

Dependent on the kind of OCR program, you can ether import directly from the scanner or use another programs to create the image files. Filtering out noise is easily done with programs like PhotoShop, where you can even simplify flourished fonts by converting the image to simplified line art before saving in any number of file formats.

The AI OCRs is not perfect. Most OCRs can deal with text fairly well, but columns throw some of them. You might have to select specific areas of text for conversion. File formats shouldn’t have to much to do with the accuracy of the OCR, less the simpler the format the better. B/W text should

A simple test picture of some text I did came out to this.

bitmap 24 bit color 200*200 120kb
bitmap 16 bit color 200*200 20kB
bitmap b/w 200*200 8kB
JPEG standard 200*200 3.71kB
GIF non interlaced 4.00kB

Wile the GIF is slightly larger than the JPEG, the GIF may offer cleaner edges for the OCR is dithering is not selected.


I have used Omnipage (http://www.omnipage.com/omnipage/mac/) and it works well. It can preserve formating and its AI is fairly good. I once used a stand alone Kurzweil reader (http://www.kurzweiledu.com/) at work, until it was removed. I like the angorythims that they use, but isn't a Mac.

A final note on output. I'm looking for a scanner I can use to scan chapters of books, or entire books, onto my computer to mark them in my word processing software with underlining, bold, italics, etc. You might be better off having the OCR (Omnipage perhaps) save the files as editable PDF file, as there are lots of dynamic options and you can then share you marked files with almost anybody.

Eniregnat
Aug 8, 2003, 02:18 PM
I hate to dredge up an old thread, but the link below has some great info on OCR programs and scanners.

http://support.scansoft.com/ocr/