Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

macgrl

macrumors 65816
Original poster
Jul 17, 2008
1,192
5
Hello all,

I have an S1300i and am wanting to know others experiences about the best DPI for B&W and Color scans so that OCR can be used. I don't really like the software that is bundled with the scanner and will be looking to use something else in the future so any advice about that would be appreciated too.

Do I have do make any changes in the OCR options in the ScanSnap settings, namely regarding the "select OCR option" and the "set marked text as keywords and converty to searchable pdf" settings?

I am presuming that providing I get a good quality scan that I can then use that pdf via OCR software in the future without having to do anything special with the scanner setting at the time of scanning?

Many thanks for any help. I would like to get this right from the start.

I should mention that I will mainly be scanning documents, occassionally containing images but mostly text. Being able to search these for words would be helpful.
 

nexx27

macrumors member
Jul 8, 2012
83
82
I have a ScanSnap IX500. Really great scanner for mac.

I like everything about it, including software, especially on OSX.

What's wrong with your scanner software?
 

jimthing

macrumors 68000
Apr 6, 2011
1,979
1,139
I use the ScanSnap S1500M and use the "auto" settings most of the time, rather than fiddle with it all the time. Though sometimes when I don't want it to auto greyscale, I'll set it to black & white or colour to get a more accurate 'picture' of the thing I'm scanning.

I tend to scan papers quickly, then afterwards batch a load of PDF files through Adobe Acrobat's OCR function, to do word-and-text PDF OCR-ing, as it's so much quicker than doing the OCR-ing within the built-in software while scanning, and gives more accurate results too.

["word-and-text PDF's" just means the words are in the PDF file and save as a layer on top of the original image underneath – but all remain in the same doc, so nothing for the user to worry about, it's essentially just the technology that makes creating searchable 'picture' PDF's possible]

Firstly, set your language (eg. en-US vs. en-UK) in Acrobat, as well as going through the few other settings to make it work best for you, before using the thing properly. Do some initial tests, like checking searching PDF's works reasonably okay (it's not always flawless, so don't expect perfection all the time, though).

These ScanSnap machines make life somewhat easier —no paper, hurray!— but don't expect amazingly perfect results on everything you might want scans of. For example, I wanted to scan some old favourite colour magazines I had (yes taking them apart first, for the doc feeder!), and while the results were okay, they certainly weren't exactly as they appeared in their physical form, but reasonable good enough.

...perhaps the ScanSnap SV600 (bit pricy at £/€/$ ~400-500 though!) may do this better (don't have to rip the mags apart, hence easier and perhaps quicker), although I suspect lighting issues may affect the results more, being open to outside lighting influences, rather than the darkness of normal doc scanners.

Hope this helps anyway.

EDIT:
BTW, I don't use the 'circle/highlight keywords' thing, as it's a bit of a gimmicky idea to me. I just scan all docs, then OCR in Acrobat, then make sure to use a naming scheme that is routine (eg. "2014.12.10.Wed - Bank Name - letter confirming account opening"), and a filing/archiving/backup scheme that works.

'Archiving' files away from 'current' files, I have found to be the best way for handling data without being overwhelmed when looking through my everyday folder system. As keeping old folders full of old and finished with files in them clogs-up everyday data management.
 
Last edited:

macgrl

macrumors 65816
Original poster
Jul 17, 2008
1,192
5
Hello,

Thank you for your replies.

With the scansnap software I found the results to be a bit inaccurate so I am looking for something better like adobe or Abby FineReader

For your documents in B & W what DPI do you find produces the best results for OCR ?

Is it a case of the higher the better in terms of DPI or after a certain level is there no great return / problems created for OCR software?

I was thinking of using 400 or 600 dpi for black and white

BTW - what is the difference between black and white and greyscale? when would you use greyscale?
 
Last edited:

jimthing

macrumors 68000
Apr 6, 2011
1,979
1,139
Hello,

Thank you for your replies.

With the scansnap software I found the results to be a bit inaccurate so I am looking for something better like adobe or Abby FineReader

For your documents in B & W what DPI do you find produces the best results for OCR ?

Is it a case of the higher the better in terms of DPI or after a certain level is there no great return / problems created for OCR software?

I was thinking of using 400 or 600 dpi for black and white

BTW - what is the difference between black and white and greyscale? when would you use greyscale?

For a beginner, start by just using the "Auto" one for resolution, as it automagically sets the best type depending on the content the scanner detects (i.e. to choose grayscale vs. bw vs. colour) and usually gives the best result at low file size.

In simple terms Greyscale is exactly like Colour, in that it gives all the shades of white to black, just as Colour offers the very many shades of Colour, whereas B&W is basically all black or nothing, with little-to-no shades of grey in between light and dark. Generally I prefer Greyscale, as it's more 'true to life' of the actual physical doc, but sometimes B&W can give certain docs a more suitable result. I'm not an expert here though, so Google it! And anyway, as I said, for me, the Auto setting chooses the best option reasonable well most of the time per doc.

Alternatively, do some tests and set-up some Profiles for the various things you wish to scan (I don't tend to use them much though myself).

Higher is NOT always the better when scanning, as artefacts that don't appear at lower DPI, may well do at higher settings; hence the "auto" feature.
OCR should work provided the result is humanly readable and in typed text; it won't do any handwriting recognition. OCR essentially reads and converts typefaces/fonts for OCR-ing; another reason not to scribble all over documents you want to scan & OCR, and keep any handwriting of manual notes on your paper docs to the plain background areas where no typed text appears -- though it often will read the typing anyway, it still helps the OCR function work better sometimes.
IMO, Adobe Acrobat in much better at OCR-ing than the built in software, and as I said is better done in a larger single batch in Acrobat after scanning one or more docs at a time first: it's very quick that way.

Trial and error really, and making sure the filesizes at each setting are not too big, unless that doesn't bother you. E.g. if you have to email these docs, higher settings on multi-page docs may well result in much larger overall filesize (though generally, these Scansnap's maintain very small filesizes compared with the results from flatbed scanners).
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.