Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

gpspad

macrumors 6502a
Original poster
Feb 4, 2014
696
47
I dont do a lot of scanning of documents, but need some software to convert short newspaper clippings that were scanned as JPG files to text files. I looked at Abbyy FineReader, but the cost of it is too much for the amount of times I am going to use it.

I hav an epson flat bed scanner, but never noticed any OCR software with it. Any suggestions on a good low cost OCR software for casual use?
 
  • Like
Reactions: TimothyR734
I use VueScan https://www.hamrick.com
Check if it is anything for you. A little cheaper then what you mentioned.
Maybe too much, if you want the OCR version.
I don't use it much. But the upgrades were still free :)
I bought license ~10 yrs ago. Pretty certain I didn't pay even a 25% of what they charge today.
 
Thanks for the reply, I already have scanner software for the epson scanner I have. You also need the pro version to use OCR, so that itself costs $100. Its a lot for a non work project for home that I won't be using much after I finish converting the conversions.
 
  • Like
Reactions: TimothyR734
I use VueScan https://www.hamrick.com
Check if it is anything for you. A little cheaper then what you mentioned.
Maybe too much, if you want the OCR version.
I don't use it much. But the upgrades were still free :)
I bought license ~10 yrs ago. Pretty certain I didn't pay even a 25% of what they charge today.
Right there with you. I’ve been using it for 15 years or so. Great product that saw me happily through a bunch of scanners. The pro version with OCR is a bargain—and it’s $10 off right now.

ETA: Scanner Pro, an iOS app, is all I use now for documents because I get better scans and faster scans using my iPhone than using a scanner. It has an OCR function but I haven’t tried it. I think it’s $3.99.
 
Right there with you. I’ve been using it for 15 years or so. Great product that saw me happily through a bunch of scanners. The pro version with OCR is a bargain—and it’s $10 off right now.

ETA: Scanner Pro, an iOS app, is all I use now for documents because I get better scans and faster scans using my iPhone than using a scanner. It has an OCR function but I haven’t tried it. I think it’s $3.99.
Yes, true I use iPad rather, but I have Scanner Pro too. Unfortunately, I can’t recall if I used the OCR function in it either. But for scanning documents, absolutely.
Great catch there, didn’t think of it as he asked for software to use with the Scanner.
Late hrs in Europe. :rolleyes:
 
I have a project that is a bunch of old newspaper clipping that are scanned to a USB drive from micro film. I'd like to convert clippings to text files. There is Nice One! scanner involved, just JPG files.

It doesnt have to be 100% accurate, just accurate enough to save some of the typing.
 
  • Like
Reactions: TimothyR734
As others mentioned, if not married to desktop, Microsoft has their Office Lens app. Free, on iOS and Android, an dhas OCR capabilities. At minimum, would need a free OneDrive account to offload scans, but might have Dropbox integration as well aince the other Office mobile apps support it.

EDIT: Oops! Totally missed the jpeg to text part.

EDIT2: appears that Google Drive has jpeg to OCR via desktop browsers. Microsoft OneNote, but, PC only (no Mac version yet, from appearances?). LibreOffice has an extension that is a beta-ish status.
 
Last edited:
  • Like
Reactions: TimothyR734
I have a project that is a bunch of old newspaper clipping that are scanned to a USB drive from micro film. I'd like to convert clippings to text files. There is Nice One! scanner involved, just JPG files.

It doesnt have to be 100% accurate, just accurate enough to save some of the typing.
Check out VueScan to see if it can open and convert your jpgs. There’s a free trial.

The only standalone OCR software I’ve used is Abbyy and yes, it’s expensive.

You also could look at third party services.
 
  • Like
Reactions: TimothyR734

Attachments

  • CCI15092015_9.jpg
    CCI15092015_9.jpg
    3 MB · Views: 225
  • CCI15092015_9.pdf
    3 MB · Views: 317
  • Like
Reactions: TimothyR734
^ https://business.tutsplus.com/tutorials/how-to-ocr-documents-for-free-in-google-drive--cms-20460

I couldn't get a sample image to open up in Google Docs, but I converted the image to PDF and got a pretty good result.

Google Docs link - https://goo.gl/6uF9Ko
Google Docs is a great idea.

My off-the-wall idea was to use dictation to read the clippings and convert them to text that way. Either the built in dictation capability in iOS (isn’t that on OSX also—it’s too late for me to recall) or something like Dragon Naturally Speaking on iOS. I’ve subscribed to that for a month when I’ve had projects where it would take too long to type the text and would be easy to speak it.
 
I tried converting a few of the clips with Vuescan and the IOS scanner pro and wasn't happy with the results, it would have been fast typing. I think your on to something with the dictation for my purposes, that scans are not always that great, there will still be some retyping, but there will be less of it.

Thanks, I got some ideas to try.
 
  • Like
Reactions: artfossil
I know this is too expensive, but I just wanted to say that ABBY Finereader is a very efficient solution for accurate results!
This is the one I personally use with PDFZone.
 
I have tried a number of the solutions above with the sample provided. OCR is the killer. The contrast between the newsprint and the black type is low, the characters have missing pixels, and there is no sharp demarcating line between the characters and the background as the ink just spread out from the letters into the newsprint.

Google docs just converted it to a PDF. Adobe Acrobat DC was successful in creating a PDF, but when I turned on text recognition it couldn't find any text. Abbyy Finereader seems to be a windows only solution. Couldn't figure out how to open a file in Vuescan to try the OCR option. My standby app, the grandaddy for OCR recognition on the Mac, ReadIris (now 17) won't even try to recognize the file. By opening the file in Photoshop and changing the black and white points in a level adjustment layer I got one app to do the OCR conversion, but the result was much useless. Don't remember which now.

If someone can actually do an image to plain text OCR conversion from this file with reasonable results I'd be very interested to see it.
 
If it’s for a single project, I would use a free trial period to get it done. If you can find something that offers a free trial period.

Just get everything ready for that step. Then do all the OCR during the trial period.
 
Yes, true I use iPad rather, but I have Scanner Pro too. Unfortunately, I can’t recall if I used the OCR function in it either.

Scanner Pro does have OCR functionality in it and it worked pretty well for me when I used it in the past, but didn't work well at all with the example clipping here. I just took a photo of my screen and it got most of the words wrong. The newspaper clipping was too tough for it to handle and it also took a long time.
 
  • Like
Reactions: Lioness~
Richdmoore did you try it with the example provided? My limited tests would seem to indicate that OCR is not going to work with such a poor original copy.

I just tried the result above (Didn't notice the files until you pointed them out.)

Poor results as well with PDFScanner:

PDFScanner

Using the Photo above with OCR:

MRS ELLA SKEEN*GARVER
DIES 0F SLGOD POISGNING
Mm. Ella JV Skeewfiarven wife are ‘ Léwts Hg Ciarmr sum daugmer m by ¥man Skew. dim} at. 4 o‘clock mm
morning at. her home? 478 *{wenty-nmt firm? A chug? was born, {,0 her 0:: lumen 29,} 1,116 little mm dying 2 a m , afian Biaod poisaning set in and mm surprising quieknesa Draught me serr‐ ond death to the tamm T h e mower and baby win he bunefi in the same make:
* Mm Carvar mm ham in Plan; City m1 Fabruary 13;, 13:5. and: wan; am e:
the mast helmed women a: mat; aac‐ tlamant. Shea lemma 3.nusham, tour chiidren, the 01633: 1&3 am! young“; :3gearsaid. father and mama,eight bmtham am: 812; siatem,

Using the PDF file above:

MRS ELLA SKEEN*GARVER
DIES 0F SLGOD POISGNING
Mm. Ella JV Skeewfiarven wife are ‘ Léwts Hg Ciarmr sum daugmer m by ¥man Skew. dim} at. 4 o‘clock mm
morning at. her home? 478 *{wenty-nmt firm? A chug? was born, {,0 her 0:: lumen 29,} 1,116 little mm dying 2 a m , afian Biaod poisaning set in and mm surprising quieknesa Draught me serr‐ ond death to the tamm T h e mower and baby win he bunefi in the same make:
* Mm Carvar mm ham in Plan; City m1 Fabruary 13;, 13:5. and: wan; am e:
the mast helmed women a: mat; aac‐ tlamant. Shea lemma 3.nusham, tour chiidren, the 01633: 1&3 am! young“; :3gearsaid. father and mama,eight bmtham am: 812; siatem,
 
I tried the jpg with Abbyy FineReader. The first output was a hot mess. The second, AFTER improving the contrast in Photoshop, is attached. Newspaper clippings with so many proper names, small margins and such poor quality are at the outer limits of OCR.

Which brings me to ask the OP: why do you need these converted to text? What's your intended use? For me, the effort to correct proper names would drive me nuts, no matter what OCR method I used. Personally, I think I could type this faster and more accurately than I could OCR it. Or better yet, hire a student to type it . . . .

converted.jpg
 
I stand corrected.
What's the basis for this statement? You've tested all of the common OCR apps?
Pretty much, yes. There aren't too many OCR apps for the Mac to begin with, so it's not as difficult as it sounds. I'll grant you that I stopped looking at new apps about two years ago, because I never found anything that worked better.

Abbyy Finereader isn't perfect, but it's pretty damn good. Unfortunately, this isn't an area that warrants a whole lot of development time for consumers. We simply don't need OCR because we all have access to digital originals now, so developers aren't spending time improving the technology.
 
  • Like
Reactions: yellow8
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.