Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

nope7308

macrumors 65816
Original poster
Oct 6, 2008
1,040
537
Ontario, Canada
I have a scanned PDF of a book chapter that I would like to make searchable/editable. The problem is that the book was scanned two pages at a time, so that one page of the PDF will show two pages of the book. If it helps, here is a crappy illustration:

[ [~~~][~~~] ]

Outer brackets represent one PDF page, two sets of inner brackets represent the two scanned book pages.

I have Adobe Acrobat 9, but the OCR text recognition tool treats each PDF page as a single object, so it fails to register the two book pages. That is, it cannot identify when one book page ends and the other begins, so it fails to recognize the text and reorient the page.

I hope that makes sense. It's incredibly difficult to describe without a Goddamn picture. Someone - anyone - please help me with this. And teach my supervisor how to use a ****ing scanner.

Thanks.
 
To test if this would even help you... Have you tried cropping a page down to one book page per PDF sheet, then running OCR?

My thinking is you'd have to duplicate every PDF sheet that has two book pages and crop the PDF sheet so that only one book page shows for each PDF sheet. If the only reason OCR is having difficulty is because there are two book pages per PDF sheet, then that may fix it for you.
 
Does adobe has ocr function? I haven't tried to use adobe to do OCR scanning, I bought the license of some professional OCR scanning applications, such OCR reader usually is conpatible with 32 and 64 bit binaries, very convenient.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.