Using Automator / Apple script to run OCR retrospectively

duffel7 · Aug 21, 2012

Hi, having begun an attempt to go paperless with my MBP, Scansnap s1300i, PDFScanner, Hazel, TextExpander and a lot of coffee. I've set up naming and filing workflows for all my new household papers.

I now want to go back and work with the existing docs that I have collected (in no particular order over the years). Many of these have not been OCR'ed and as I write I can't be sure of what file extensions I am going to encounter.

Can anyone help me put together something in automator that will open non-OCR docs in app PDFScanner, 'Select all', run 'Recognize Text' from the menu and then save file. Can this be done?

Has anyone else started this retrospective task and learned any useful lessons?

Thanks in advance!

omr · Jun 20, 2014

I see this is an old post, but since I haven't found a more recent answer or post in this regard I thought I'd bring it back to life being that I have the same or similar question.

My goal is more related to automatically OCR pdf's as I download them from various sites, which have not yet had OCR performed, but I might also apply an automated task to already existing files as was specifically referenced in the original post, and either way the goal is the same.

I have found that an applescript has been provided for a folder action that uses PDFpen to perform the OCR:
applescript by Smile Software (PDFpen) via macsparky (macpowerusers) http://macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html

I, however, wondered if I could substitute with an OCR engine that I, and likely the OP since he has a ScanSnap, already have? That being Abbyy Fine Reader (included in the ScanSnap software).
I'm not familiar with applescript so I'm not aware if I could simply swap the application name in the script to point to the application I want to substitute with. In this case Abby Fine Reader.

...I've also noticed numerous search hits via google for automating/scripting to use Adobe Acrobat Pro to perform OCR.

FreakinEurekan · Jun 20, 2014

omr said:
I, however, wondered if I could substitute with an OCR engine that I, and likely the OP since he has a ScanSnap, already have? That being Abbyy Fine Reader (included in the ScanSnap software).
I'm not familiar with applescript so I'm not aware if I could simply swap the application name in the script to point to the application I want to substitute with. In this case Abby Fine Reader.

Probably not quite that simple a substitution. Based on that script, it's clear that the PDFpenPro application exposes "methods" (not sure what AppleScript calls them) for open and ocr to let Automator tell the application what to do. I don't know if Abbyy does the same, I don't have Abbyy and a quick search on their Website didn't show any mention of Automator support.

omr · Jun 21, 2014

That makes sense. Thanks for the info!

Search

Search

Using Automator / Apple script to run OCR retrospectively

duffel7

macrumors newbie

omr

macrumors member

FreakinEurekan

macrumors G3

omr

macrumors member

Our Staff