Tools for proofing OCR'ed documents

Discussion in 'Mac Apps and Mac App Store' started by s4yunkim, Mar 20, 2011.

  1. s4yunkim macrumors regular

    Joined:
    Feb 6, 2009
    #1
    Hi all,

    I'm in the process of trying to scan a Korean book into an eBook.

    Being Korean-American, I can read/write Korean just fine, but English is my native language so errors in the OCR don't pop out at me like errors in English text would, so I'm having to go through it line by line to make sure things have been recognized properly.

    Two problems I've been coming across in the process is that the spacing of the letters in the book are a bit narrower than normal (so the OCR program will recognize "ABC DEF GHI JKL" as "ABCDEFGHIJKL") and that small dashes/ticks on characters are sometimes missed, so it will mix up vowels like: "ㅣ ㅏ ㅑ" where just one little tick makes a difference.

    I realize this is probably a margin of error that I am going to have to deal with when it comes to OCRing documents, but I was wondering if there was an editor out there that will let me go through it line by line, putting a text box of what has been OCRed under the original image, so that I can just compare it line by line rather than paragraphs at at time, like this:

    [​IMG]

    And when I press enter, it would pop up the same box on the next line, and so on.


    I'm using Adobe Acrobat 10 and ReadIris 12, both for mac to do the OCR. Any suggestions would be greatly appreciated! Thanks! :)
     
  2. MisterMe macrumors G4

    MisterMe

    Joined:
    Jul 17, 2002
    Location:
    USA
    #2
    Any word processor with a dictionary will do the job. If it has a grammar checker, so much the better.
     
  3. s4yunkim thread starter macrumors regular

    Joined:
    Feb 6, 2009
    #3
    I realize that I can put it into a word processor but I'd like to be able to compare it line by line, in an overlay fashion...

    From my experience even Korean made word processors don't do that great of a job in finding such errors...
     
  4. MisterMe macrumors G4

    MisterMe

    Joined:
    Jul 17, 2002
    Location:
    USA
    #4
    Look. There is no magic to this. No computer program can read the mind of the author. Run the OCRed file through your word processor. Then printout the document along with the original. Sit at a table with both the OCRed document, the original, and a pen. Carefully go through each glyph-by-glyph.
     
  5. kolia macrumors newbie

    Joined:
    Mar 26, 2011
    #5
    I have been looking for exactly the same thing.

    Consider having to proof hundreds of pages of old 17th century latin documents full of abbreviations... Proofing OCRed text is a headache if you have to go back and forth between two sheets of paper, or between two screen windows: you keep loosing which line or word you were on. For texts where you can't just proof by knowing the language well enough to spellcheck as you read, for example texts with lots of proper names, going back and forth between 2 windows or sheets of paper quickly becomes a nightmare.

    If the original image and the OCRed text were one above the other, line by line, differences would pop up.

    There seems to be software that does this on PC, haven't found any for mac. I would be very curious to know if you find anything; please post something here if you do!
     

Share This Page