Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

moreforus

macrumors newbie
Original poster
Nov 16, 2011
8
0
Hello everyone,

I have a lot of scanned pdf files (typical size around 200 MB) that I would like to convert to searchable pdf with the OCR part of Devonthink Pro Office. With my old mac pro 1.1 it takes forever.

I am looking either for a macbook pro or an Imac 27 inch. Which one would be best for this task, and how fast it will be compared to my macpro ?

Thanks

PS. Sorry for my english, it is not my mother tongue.
 
Just looking at benchmarks, either computer will smoke the old Mac Pro. But if you're looking for something with the best performance I'd say get a 27" iMac with the i7, that should make short work of making your PDFs searchable.
 
A lot of what I do on my mac is OCR. I use Abbyy Finereader Pro, so I'm more or less familiar with how such things work on my machine (2014 iMac 5K, with an i5/m290x/24 GB Ram). Devon think uses the Abbyy Finereader engine, so I figured it would be an interesting look into how it uses my computer's resources. I'm working with a demo copy of DevonThink, so ... I have 150 hours to test things...

Do you have a example document?

I found a document from my stash that is 193 MB, and 384 pages, but it may differ materially from the documents you are interested in.

http://resources.metmuseum.org/reso.../The_Metropolitan_Museum_Journal_v_3_1970.pdf
One thing I've noticed, besides the slowness, is that DevonScribbler uses 4 threads, 96 percent CPU, and about 211 MB--so it's not really using the resources that are available. After it's finished, I'll run it through Finereader Pro, and compare.
 
Last edited:
There are perhaps several possible bottlenecks to the slow performance you're experiencing. How fast the pdf is read from disk, how fast it is processed by the CPU, and how fast it can be written back to the disk. Other factors include memory contention (not enough memory) and whether the CPU processing is single threaded or multithreaded. You can use Utilities/Activity Monitor to help you understand where some of the bottlenecks are.

Offhand, I would start with the disk as a bottleneck. MP 1,1 is very slow compared to today's SSD Macs.
 
Thank you all for your answers.
[doublepost=1500292538][/doublepost]I really apreciated your remarks and suggestions
Jerwin, the kind of documents I scan are generally textbook about physics or maths. Like your document, they may contain between 300 to 800 pages. And are scanned between 200 to 300 dpi depending on the quality of the print.
Concerning the speed of ocr, it takes for example 4 hours to convert to searchable pdf a document of 775 pages and 280 MB. The converted file's size is now 410 MB.
During the ocr, devonscribbler used 100 % cpu as can be seen on activity monitor.

Kohlson, on my computer the system is on a ssd with 170 MB/s in writing and 240 MB/s in reading speeds. It seems that only one core is used during the ocr process.
 

Attachments

  • devonthink ocr.jpg
    devonthink ocr.jpg
    227.4 KB · Views: 148
I believe your MP has SATA-II disk interfaces, limited to 3Gbps. Most disk interfaces now are twice that fast. Apple's MBP SSD interfaces are several times faster. Off the top of my head, either 20 or 40 Gbps.

Single-threaded apps are generally limited by a single processor's clock speed. This means that a faster processor will deliver better performance than a slower, multithreaded processor. Other factors such as cache size also have an effect on performance.

How all this comes together to improve application throughput is hard to know from afar. The best way to understand this is to try a sample on similar system.
 
Could you provide a more representative pdf from which to benchmark? Just so things are comparable.
 
OCR is something that a GPU can do very fast, I don't know if your software can use the built-in GPU, but I have seen CUDA code thatt can be really fast, 30x faster than CPU based OCR.
 
I don't know if your software can use the built-in GPU, but I have seen CUDA code thatt can be really fast, 30x faster than CPU based OCR.

I'm pretty sure that Abbyy is holding back performance improvements so that it can sell its high throughput versions at a higher cost.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.