Searching for text within pdf files -- without opening the files?

Discussion in 'Mac Apps and Mac App Store' started by Fishrrman, Jun 25, 2013.

  1. Fishrrman macrumors G3

    Joined:
    Feb 20, 2009
    #1
    Hello -

    I'm looking for a standalone search app that can search for text strings that are contained _within_ pdf files -- WITHOUT first opening the file.

    Adobe Reader can do this, but it's enormous -- like using an atom bomb to kill a fly.

    The app must be able to search independently of indexes created by Spotlight (as I do not maintain Spotlight on ANY of my computers -- I have reasons for doing so).

    There are other apps that can search for text within text (or other) files. One that comes to mind is "EasyFind". But it _can't_ find text that is contained _within_ a pdf file.

    What I will use this for:
    I have a folder containing hundreds of pdf files, and would like to search for text strings that exist in those files, without having to open each file individually. I want to point my search engine at the folder, type in a text string, and then have it identify those file(s) that contain the string.

    The text I'll be searching for _is_ "text" within the files -- not images, etc. When a file is opened, I can copy the text out "as text". But I have to find the files first!

    Anything out there (other than Acrobat Reader) that can do this?
     
  2. mpainesyd macrumors 6502

    mpainesyd

    Joined:
    Nov 29, 2008
    Location:
    Sydney, Australia
    #2
    I use the search feature of Finder (not spotlight) to do this all the time. I then use quick-look (spacebar) to view the contents of the listed pdfs.

    Try the Cnet archive for a suitable utility, if Finder is not suitable
    http://download.cnet.com/mac/
     
  3. Fishrrman thread starter macrumors G3

    Joined:
    Feb 20, 2009
    #3
    "I use the search feature of Finder (not spotlight) to do this all the time. I then use quick-look (spacebar) to view the contents of the listed pdfs."

    Doesn't work for me.

    I'm guessing that's because Spotlight is turned off.
     
  4. mpainesyd macrumors 6502

    mpainesyd

    Joined:
    Nov 29, 2008
    Location:
    Sydney, Australia
  5. onekerato macrumors regular

    Joined:
    Jun 6, 2011
    #5
    DevonThink can search inside PDF files.

    It extracts the text out of PDF files, creates its own searchable index (separate from spotlight) so it can search PDFs fast.

    My guess is that any database-oriented app such as Yojimbo, Together, EagleFiler which allows storing of PDFs should also be quite capable of searching the content inside PDFs without need for spotlight (since it would be pretty slow using spotlight.)

    Jose
    www.onekerato.com/ebooks.html
     
  6. Fishrrman thread starter macrumors G3

    Joined:
    Feb 20, 2009
    #6
    Thanks for the pointer to Devonthink.

    The "Personal Edition" seems to do what I need without being an enourmously bulky application (it's about 45mb).
     
  7. yamaduc macrumors member

    yamaduc

    Joined:
    Apr 22, 2008
    Location:
    NorCal (Residing in SoCal)
    #7
    Open a terminal.

    find . -name "*.pdf" | xargs grep -i "the word you are looking for"

    hit return
     
  8. flynz4 macrumors 68040

    Joined:
    Aug 9, 2009
    Location:
    Portland, OR
    #8
    That was going to be my recommendation to you as well. I personally use DevonThink Pro Office (DTPO)... but I use it as a personal database and the heart of my "paperless office". The lighter DT Personal should handle your needs.

    Off on a tangent (just in case any readers have any interest in this area).... DTPO combined with a Fujitsu ScanSnap is an unbelievable combination. Now, whenever paper comes into the house... it is either:

    1. Scanned/shredded
    2. Just shredded
    3. Recycled

    The only remaining category of paper that we keep is official documentation such as birth certificate, real estate titles... etc. Usually things with an official original seal.

    Getting rid of the paper is freeing. I do not think it is long term fesible without a full duplex sheet scanner and a powerful database. This combination is incredible.

    /Jim

    P.S. I am am curious why you do not want to use spotlight indexing. I am wondering if there is some vulnerability or something that I am not aware of.
     

Share This Page