Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jerwin

Suspended
Original poster
Jun 13, 2015
2,895
4,652
I have a collection of 19th century magazines (over a thousand pages). I've been digitizing them with my Snapscan SV600. I am not aware of a professional database that indexes this magazine title.

I would like to use this collection to translate excerpts, write articles, and develop interactive web applets.

Most of the articles I'm interested in build off of previous articles.I've been thinking that the best way to organize the complicated web of cross references (so that I can organize my republication of this trove, and buy the missing issues as they appear on a certain auction site) is to use a database.

At present, they are filed away in finder folders like so...

flatfile.jpg


which is problematic for what I want to do

I own finereader pro, and I also use tesseract when the finereader automator plugins fail me-- which is regrettably often.

so, could Devon think help me construct the indices, concordances and mindmaps neccesary to comprehend this collection at a glance?

And does Deveonthink work with SVG files, cad files, and Scrivener files?

What are the limitations, in practice, of the various Devonthink versions? Can I rely on Finereader, instead of using the OCR in the Pro Office version?
 
Last edited:
Hello, I can help you in some questions but not all of them. I use DT Pro Office, that is the highest version, then I'm not sure what is in each one, but I can explain to you my workflow.

I have almost all the science fiction pulps until 1960 or so, in PDF, and regularly add other stuff like very old Jules Verne books (Spanish versions). And as subscriber of Asimov's Science Fiction and Analog SFF, I "scan" them to have in my databases and as references.

What I do is:
  1. Capture with my iPad the Magazine (or get from internet). I use Magzter for my purchases when there is no real PDF version or cannot get paper version at a reasonable price (I'm living in The Netherlands but I'm Spanish).
  2. Pass some filters to the captured images (normally, crop the margins, convert to 256 grey levels and then a "Improve Details" filter. For that I use XnView. I do this because final PDF size. This way I get a very readable PDF with relatively not much file size. For Verne Facsimiles I generate two versions, a full resolution one and a "reading" other.
  3. Then I use PDF Expert to compose a PDF from the generated images. You can use any other tool for that. I use it because I had it purchased since a lot of time ago and works perfect to compose PDF from images, because it does not changes resolution, size and uses a good compression level.
  4. As SetApp suscriber, then I use PDF Pen to OCR the resulting PDF.
  5. Then I have a very complex file tree in iCloud Drive and drop the final PDF into that folder tree, that is indexed by DT.
Why I use PDF Pen to OCR? Because it, as PDF Expert, does not changes the image quality. It simply adds the text layer and respects the quality. DT Pro office has its own OCR engine (licensed from ABBY) but it does not a good job, because it dramatically decreases the image quality and dramatically increases the file size. For example, a PDF that originally used, say, 70MB at 300 DPI, after OCR at 150 DPI it will use 500 MB or more in DT, and using PDF Pen it only increases in a couple of KB (the text layer).

I used "Cisdem PDF Converter OCR", that is cheap and did a job between DT and PDF Pen, but you must be very careful with the updates because the use to break the program.

In relation to your specific questions, DT does not understand CAD and Scrivener files except to make a Thumbnail and index the possible inside text. I ignore SVG files, but assume a similar support than macOS Preview App.

DT won't do document indexes nor concordances (but can generate an index of files, I think), but is very good to find relations between documents, and search engine is very good. The most important feature of DT from other Document Store/Manages is it can "index" external files (saving only the metadata and remaining the file in destination), really syncs between devices better than any other tool, and has some "magic" between iOS and macOS versions (basically, externally indexed files are available in iOS versions does not matter if the file is really available in its original place or not).
 
  • Like
Reactions: jerwin
You might try an actual database. It sounds like you might need to create your own entries for things, and it also sounds like generating reports might be of use. DT uses a database and the outside files are organized that way, but it doesn't have ways to customize that are as powerful as a real relational database.
 
  • Like
Reactions: jerwin
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.