Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Silly John Fatty

macrumors 68000
Original poster
Nov 6, 2012
1,806
518
So I'm scanning some books right now with Adobe Scan (basically I'm taking pictures of the books' pages with my iPad and it crops them automatically and puts them in an order), and I'd like to have these scans turned into text, so I can search for words, mark text and copy text as well.

Does anyone ever do things like these, and do you have a preferred software?

I also wonder how do these things work when there's things like graphs, or some design kind of stuff?
 
So I'm scanning some books right now with Adobe Scan (basically I'm taking pictures of the books' pages with my iPad and it crops them automatically and puts them in an order), and I'd like to have these scans turned into text, so I can search for words, mark text and copy text as well.

Does anyone ever do things like these, and do you have a preferred software?

I also wonder how do these things work when there's things like graphs, or some design kind of stuff?
I use an app called Textify. It works very well for me - does OCR conversion on .pdf files as well as images (.png, .jpg, etc). With that said, I have never tried it with more than a 2 or 3 page .pdf... and not sure about how it handles charts and graphs.

Textify - Text recognition (OCR) made easy and accurate #1 Product of the Day PH
 
I use an app called Textify. It works very well for me - does OCR conversion on .pdf files as well as images (.png, .jpg, etc). With that said, I have never tried it with more than a 2 or 3 page .pdf... and not sure about how it handles charts and graphs.

Textify - Text recognition (OCR) made easy and accurate #1 Product of the Day PH

I’ve tested adobe scan and you’d think it’s the best in the industry but it handles things not really well. I wonder if there’s better than them.
 
So I'm scanning some books right now with Adobe Scan (basically I'm taking pictures of the books' pages with my iPad and it crops them automatically and puts them in an order), and I'd like to have these scans turned into text, so I can search for words, mark text and copy text as well.

Does anyone ever do things like these, and do you have a preferred software?

I also wonder how do these things work when there's things like graphs, or some design kind of stuff?
The function you're looking for is called OCR, for Optical Character Recognition.

Adobe Scan/Acrobat has this capability built in.

FWIW, the more recent versions of macOS also have OCR built into Preview as part of the Live Text feature; it can recognize text inside photos in your Photo Library.

Here's Adobe's OCR instructions for Acrobat.

When OCR programs encounter text in a graph or image, it may recognize the text and convert it, but it will likely not know how to handle the entire graph + labels as a single unit, so you may need to recreate it with editable text in whatever layout program you're using.
 
  • Like
Reactions: Tagbert
FWIW, the more recent versions of macOS also have OCR built into Preview as part of the Live Text feature; it can recognize text inside photos in your Photo Library.

Thanks, I didn't know this. I looked into it but it's of no real use for what I need to do. I'm scaring some old books basically, and I have thousands of pages to scan.

The function you're looking for is called OCR, for Optical Character Recognition.

Adobe Scan/Acrobat has this capability built in.

Here's Adobe's OCR instructions for Acrobat.
When OCR programs encounter text in a graph or image, it may recognize the text and convert it, but it will likely not know how to handle the entire graph + labels as a single unit, so you may need to recreate it with editable text in whatever layout program you're using.

Sadly Adobe has already issues not messing up just clear text. I've been using Adobe Scan on my iPad/iPhone as well as the Web version of Adobe for now, and I have to say, it doesn't really work well – for what I do at least. It's not really made to convert text from books. It works well with invoices however. I think its problem is that it has difficulties with text that is not on a straight line. So when a page is bent, which is usually the case with a book when it's open, that page will be messed up.

But Adobe is such a big company, I can't believe someone else would have done it better. So if they don't do it right, probably nobody else does. But if someone has experience with Adobe and something else and thinks the other one is better than Adobe, then I'd be curious to try that out.
 
I use ocrmypdf from the macOS terminal for this. You can install it via home brew.

nota bene: using a scanner (scanning/adding single page by single page to a pdf) will probably result in more “flat” page scanns (as well as higher resolution), which helps quite a bit when adding an OCRed layer for text; OCR quality may be poor if the wrong language is used; ocrmypdf offers a —deskew option which might improve OCRing PDFs of your iPad photos.
 
  • Like
Reactions: Silly John Fatty
  • Like
Reactions: Silly John Fatty
I think its problem is that it has difficulties with text that is not on a straight line. So when a page is bent, which is usually the case with a book when it's open, that page will be messed up.
This is challenging for all OCR software I've tried. If you're able to give the software a cleaner and straighter image, OCR will work a lot better. If this is a big project and the results are important, you might consider a dedicated book scanner that's set up specifically for this. Might even be possible to rent or borrow one for the duration of your project.

Found this randomly on the web:
 
  • Like
Reactions: Silly John Fatty
This is challenging for all OCR software I've tried. If you're able to give the software a cleaner and straighter image, OCR will work a lot better. If this is a big project and the results are important, you might consider a dedicated book scanner that's set up specifically for this. Might even be possible to rent or borrow one for the duration of your project.

Found this randomly on the web:

Even with the absolute cleanest result I could get, Adobe still has issues. So it's not really usable for me. But Thanks for referring me to those book scanners! I will look if I can use one somewhere, maybe some libraries or universities give you access to one, or maybe I might buy a used one and sell it again later.
 
  • Like
Reactions: ignatius345
Even with the absolute cleanest result I could get, Adobe still has issues. So it's not really usable for me. But Thanks for referring me to those book scanners! I will look if I can use one somewhere, maybe some libraries or universities give you access to one, or maybe I might buy a used one and sell it again later.
Proper book scanners aren't cheap: they tend to be in the five-figure range. (The Internet Archive has several for scanning public-domain books into their system.) I would ask around if there's a library / archives services company that could do it.

There are companies you can send your book to to get scanned - the cheapest way is destructively (i.e. they slice off the cover and binding so they can autofeed pages), non-destructively is more expensive.
Here's an article that lists some of the bigger ones:

 
  • Like
Reactions: Silly John Fatty
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.