Which Mac for ocr of large documents ?

Discussion in 'Buying Tips and Advice' started by moreforus, Jul 16, 2017.

  1. moreforus macrumors newbie

    Joined:
    Nov 16, 2011
    #1
    Hello everyone,

    I have a lot of scanned pdf files (typical size around 200 MB) that I would like to convert to searchable pdf with the OCR part of Devonthink Pro Office. With my old mac pro 1.1 it takes forever.

    I am looking either for a macbook pro or an Imac 27 inch. Which one would be best for this task, and how fast it will be compared to my macpro ?

    Thanks

    PS. Sorry for my english, it is not my mother tongue.
     
  2. ApolloBoy macrumors 6502a

    ApolloBoy

    Joined:
    Apr 16, 2015
    Location:
    San Jose, CA
    #2
    Just looking at benchmarks, either computer will smoke the old Mac Pro. But if you're looking for something with the best performance I'd say get a 27" iMac with the i7, that should make short work of making your PDFs searchable.
     
  3. jerwin, Jul 16, 2017
    Last edited: Jul 16, 2017

    jerwin macrumors 65816

    Joined:
    Jun 13, 2015
    #3
    A lot of what I do on my mac is OCR. I use Abbyy Finereader Pro, so I'm more or less familiar with how such things work on my machine (2014 iMac 5K, with an i5/m290x/24 GB Ram). Devon think uses the Abbyy Finereader engine, so I figured it would be an interesting look into how it uses my computer's resources. I'm working with a demo copy of DevonThink, so ... I have 150 hours to test things...

    Do you have a example document?

    I found a document from my stash that is 193 MB, and 384 pages, but it may differ materially from the documents you are interested in.

    http://resources.metmuseum.org/reso.../The_Metropolitan_Museum_Journal_v_3_1970.pdf
    One thing I've noticed, besides the slowness, is that DevonScribbler uses 4 threads, 96 percent CPU, and about 211 MB--so it's not really using the resources that are available. After it's finished, I'll run it through Finereader Pro, and compare.
     
  4. kohlson macrumors 6502a

    Joined:
    Apr 23, 2010
    #4
    There are perhaps several possible bottlenecks to the slow performance you're experiencing. How fast the pdf is read from disk, how fast it is processed by the CPU, and how fast it can be written back to the disk. Other factors include memory contention (not enough memory) and whether the CPU processing is single threaded or multithreaded. You can use Utilities/Activity Monitor to help you understand where some of the bottlenecks are.

    Offhand, I would start with the disk as a bottleneck. MP 1,1 is very slow compared to today's SSD Macs.
     
  5. moreforus thread starter macrumors newbie

    Joined:
    Nov 16, 2011
    #5
    Thank you all for your answers.
    --- Post Merged, Jul 17, 2017 ---
    I really apreciated your remarks and suggestions
    Jerwin, the kind of documents I scan are generally textbook about physics or maths. Like your document, they may contain between 300 to 800 pages. And are scanned between 200 to 300 dpi depending on the quality of the print.
    Concerning the speed of ocr, it takes for example 4 hours to convert to searchable pdf a document of 775 pages and 280 MB. The converted file's size is now 410 MB.
    During the ocr, devonscribbler used 100 % cpu as can be seen on activity monitor.

    Kohlson, on my computer the system is on a ssd with 170 MB/s in writing and 240 MB/s in reading speeds. It seems that only one core is used during the ocr process.
     

    Attached Files:

  6. kohlson macrumors 6502a

    Joined:
    Apr 23, 2010
    #6
    I believe your MP has SATA-II disk interfaces, limited to 3Gbps. Most disk interfaces now are twice that fast. Apple's MBP SSD interfaces are several times faster. Off the top of my head, either 20 or 40 Gbps.

    Single-threaded apps are generally limited by a single processor's clock speed. This means that a faster processor will deliver better performance than a slower, multithreaded processor. Other factors such as cache size also have an effect on performance.

    How all this comes together to improve application throughput is hard to know from afar. The best way to understand this is to try a sample on similar system.
     
  7. jerwin macrumors 65816

    Joined:
    Jun 13, 2015
    #7
    Could you provide a more representative pdf from which to benchmark? Just so things are comparable.
     
  8. dollystereo macrumors 6502a

    dollystereo

    Joined:
    Oct 6, 2004
    Location:
    France
    #8
    OCR is something that a GPU can do very fast, I don't know if your software can use the built-in GPU, but I have seen CUDA code thatt can be really fast, 30x faster than CPU based OCR.
     
  9. jerwin macrumors 65816

    Joined:
    Jun 13, 2015
    #9
    I'm pretty sure that Abbyy is holding back performance improvements so that it can sell its high throughput versions at a higher cost.
     

Share This Page