Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

s4yunkim

macrumors regular
Original poster
Feb 6, 2009
168
32
Hi all,

I'm in the process of trying to scan a Korean book into an eBook.

Being Korean-American, I can read/write Korean just fine, but English is my native language so errors in the OCR don't pop out at me like errors in English text would, so I'm having to go through it line by line to make sure things have been recognized properly.

Two problems I've been coming across in the process is that the spacing of the letters in the book are a bit narrower than normal (so the OCR program will recognize "ABC DEF GHI JKL" as "ABCDEFGHIJKL") and that small dashes/ticks on characters are sometimes missed, so it will mix up vowels like: "ㅣ ㅏ ㅑ" where just one little tick makes a difference.

I realize this is probably a margin of error that I am going to have to deal with when it comes to OCRing documents, but I was wondering if there was an editor out there that will let me go through it line by line, putting a text box of what has been OCRed under the original image, so that I can just compare it line by line rather than paragraphs at at time, like this:

screenshot.jpg


And when I press enter, it would pop up the same box on the next line, and so on.


I'm using Adobe Acrobat 10 and ReadIris 12, both for mac to do the OCR. Any suggestions would be greatly appreciated! Thanks! :)
 
Any word processor with a dictionary will do the job. If it has a grammar checker, so much the better.
 
I realize that I can put it into a word processor but I'd like to be able to compare it line by line, in an overlay fashion...

From my experience even Korean made word processors don't do that great of a job in finding such errors...
 
I realize that I can put it into a word processor but I'd like to be able to compare it line by line, in an overlay fashion...

From my experience even Korean made word processors don't do that great of a job in finding such errors...
Look. There is no magic to this. No computer program can read the mind of the author. Run the OCRed file through your word processor. Then printout the document along with the original. Sit at a table with both the OCRed document, the original, and a pen. Carefully go through each glyph-by-glyph.
 
I have been looking for exactly the same thing.

Consider having to proof hundreds of pages of old 17th century latin documents full of abbreviations... Proofing OCRed text is a headache if you have to go back and forth between two sheets of paper, or between two screen windows: you keep loosing which line or word you were on. For texts where you can't just proof by knowing the language well enough to spellcheck as you read, for example texts with lots of proper names, going back and forth between 2 windows or sheets of paper quickly becomes a nightmare.

If the original image and the OCRed text were one above the other, line by line, differences would pop up.

There seems to be software that does this on PC, haven't found any for mac. I would be very curious to know if you find anything; please post something here if you do!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.