Need a clever solution

Discussion in 'Mac Apps and Mac App Store' started by The Mercurian, Apr 11, 2012.

  1. The Mercurian macrumors 65816

    Joined:
    Mar 17, 2012
    #1
    Ok so here is the problem.

    I have an archive of approx 6000 pdf's. At one point I ran OCR on them using adobe with the output saved to a new folder with files named PostOCR-filename.pdf
    For some reason the folder of post OCR files is 12 files short of the original pdf file count.

    I want to keep only one copy of all files. So basically I want to keep the PostOCR files, plus the 12 unknown files that weren't OCR'ed. The problem is - how do I identify those 12 files ?

    I mean I could open 2 finder windows and drive myself demented manually searching for them - but I figure there must be a smarter way.

    Anyone have any genius quick and easy ideas ?
     
  2. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #2
    Are you afraid of Terminal.app and command lines?

    B
     
  3. The Mercurian, Apr 11, 2012
    Last edited: Apr 11, 2012

    The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #3
    Not afraid - I'd be willing to have a go at that.

    I haven't really used it before in OSx, but back in the day I used to use linux/unix so I'm not afraid of command line stuff
     
  4. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #4
    Save this into a text file. Call it something like ~/Desktop/cleanup.sh

    Code:
    #!/bin/bash
    mkdir ./deleteme
    POSTPATH=./PostOCR
    for pdf in *.pdf
    do if [-f "$POSTPATH/PostOCR-$pdf"]
    then
    	mv -i $pdf deleteme/
    fi
    
    I assumed that the PostOCR files are in a subfolder of where the original files are. (modify if necessary).

    open Terminal.app and make the originals folder your working directory.

    Code:
    cd ~/Documents/originals
    then

    Code:
    source ~/Desktop/cleanup.sh
    should collect all the dupes into the deleteme folder. Easy to wipe out then.

    B
     
  5. The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #5
    Epic!
    Will give it a go.
    Gotta learn me more of this Terminal stuff.
     
  6. The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #6
    ok I'm getting this error:

    line 9: syntax error: unexpected end of file

    I don't quite understand why ?!:confused:
     
  7. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #7
    I forgot a done at the end of the file to end the for loop.

    Try that.

    B
     
  8. The Mercurian, Apr 11, 2012
    Last edited: Apr 11, 2012

    The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #8
    Ah fixed that problem but now I have another - I got 6000 of these
    Edit: ok fixed that it just needed a space

    It runs without complaining now and it creates the directory but it doesn't do anything else ?!
     
  9. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #9
    Did you modify POSTPATH appropriately or move those files to a subfolder of the originals path?

    Are you sure about the post OCR filenames? If you didn't just prepend "PostOCR-" or the files are not in the right folder it won't work.

    Can you post a sample of filenames with full path from the originals and PostOCR?

    B
     
  10. The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #10
    Hey no worries it works now.

    I had a space in the directory name as in 'Post OCR'. So I took the space out and worked perfectly, leaving me with my 12 unidentified files now identified.

    Thanks a million!!!! Much appreciated!!! Saved me work and I learned something!!!!!:D

    ----------

    Can I ask you I understand the script except for one thing
    in the 'for pdf' - is the first pdf an array name you are defining, or does it have some meaning in bash ?
    and then is $pdf a member of the array ?
     
  11. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #11
    Exactly. *.pdf creates the list, and then for sets the variable $pdf to each element of the list.

    I could have called it file or even rose. Just a placeholder.

    Glad it's done.

    B
     
  12. The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012
    #12
    Ok so just to clarify cause I'm slow. When do you use the $ sign and when not ? In the for statement there is no dollar sign but in the do statement there is. Does $ mean 'one entry of a variable array' ?
     
  13. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #13
    Use it when you want the value substituted in, not when you are defining it (see POSTPATH vs $POSTPATH too).

    B
     
  14. The Mercurian thread starter macrumors 65816

    Joined:
    Mar 17, 2012

Share This Page