Originally Posted by osx-addict
I'm coming in a bit late in this thread but the topic still seems relevant.. I recently purchased the Pro Office version of DevonThink specifically because it has the OCR feature to use in conjunction with our recently bought Scansnap S1300 (to replace our aging but still functioning Scansnap 5110EOX -- no decent drivers for it on the Mac that would work seamlessly with DTPO).. Anyway, we are slowly trying to get to be mostly paperless -- scanning bills, medical statements, importing bank estatements directly (they're much smaller than the scanned equivs),etc..
A few things I'm hesitant to destroy after scanning -- deeds in particular or papers from the local county/state office with an official seal -- obviously NOT birth certs!.. What do you all do with those more 'official' docs?
By the way.. For those of you that have the S1500, is the speed of scanning pretty decent? We are currently using the S1300 (smaller brother to the S1500) and it's pretty slow compared to our older FI-5110EOX Scansnap which was pretty darned fast (I think the S1500 is more or less a newer version of our old FI-5110).. Luckily I picked this S1300 up from Craigslist for $150 so the out of pocket wasn't bad -- the old FI-5110 scanner was close to $800 about 7-8 years ago! Ouch!
As for DTPO -- I'm using one database on the 2nd Mac drive and have imported ALL of my docs into it and I think I've converted most PDF's scanned by the old scanner into OCR'd "Pdf + text" equivs -- interestingly enough, the filesizes are HUGELY different between the originals and OCR'd versions.. With our old scanner I was using ScanTango at 200DPI.. It would create HUGE documents in the last few versions of OSX -- a 10 page doc could easily be 100Mb in size with no OCR -- that same document after DTPO got it transformed was perhaps 1-2Mb in size -- a huge difference.
Anyway, my DTPO database is a bit over 25Gb in size although after I empty the DTPO trashcan it will probably shrink by a gig or two after I delete the original (larger) PDF's before OCR was done.
A few questions for you all :
1) Is there a way to have newly scanned docs get filenames applied without a popup dialog box? Currently after OCR'ing is done I get a popup dialog asking for the filename + timestamp and DTPO will stop and wait until "OK" is pressed before continuing to OCR the next document in the queue.
2) I notice when converting a document to an OCR'd equivalent that occasionally the new OCR'd version of the document is a bit fuzzier than the original was -- is there a way to adjust this? There has been a few times where I've considered keeping the non-OCR'd version of the document because of this fuzzyness.
Also -- now that you're making your databases, please ensure that you back them up from time to time! I'm personally using CrashPlan to backup my stuff offline (and encrypted of course)..
With regard to the fuzziness you noticed on the post-ocr images, I think you've actually answered your own question earlier in your comment...
You noted that file sizes drop substantially after OCR, the reason for this is that the resulting pdf contains only a low-res image, intended primarily for use in verifying the accuracy of the OCR if there is any question later on. This makes your documents quick to index and small to store.
Also, with regard to managing certain official documents, we must accept that we really can't be truly paperless at this point. After scanning, my documents all get put in a 9x12x5 safe deposit box at my bank. I pay $50 a year for not having to worry about documents being destroyed at my home if anything were to happen.