AppleScript to move and rename PDFs

Discussion in 'Mac Programming' started by kiercardo, Nov 11, 2016.

  1. kiercardo macrumors newbie

    Joined:
    Jan 3, 2013
    Location:
    Rome, IT
    #1
    Hi!
    I downloaded a lot of academic papers, and their first page is not very useful, and i would move it to the bottom of the file. I also saw that the first line is the title of the article, which i would use to rename the pdf (the files have a random filename, pretty useless). I think I should use Automator to do that, but I'm a completely newbie to automation. Can you help me?
    Thanks



    [​IMG]
     
  2. organicCPU macrumors 6502a

    organicCPU

    Joined:
    Aug 8, 2016
    #3
    There might be cleaner and easier solutions, but here are my thoughts!
    The following code depends on PDFtk Server, a command-line tool licensed under GNU General Public License that you can use for free as long as you don't sell commercial software with it. If you want to use the code below, you need to install it. Maybe you can also get the task done with the Applescript library of Preview.app or with another PDF tool like Xpdf, but I have chosen PDFtk Server. We use it for two things:
    1. Make a dump of the file to get the title for the PDF (sometimes the title is missing in the test files I downloaded from JSTOR. More on this later)
    2. Move the first page to the end of the document and write the file
    Then there is another command-line tool involved that you should already have installed on your Mac. It's called: awk. With awk I did the string manipulations to extract the title from the file dump. Alternatives would be to use sed or grep for this.
    I guess it would be better to write this as a shell script, but here is a one liner you can copy and paste to your Terminal bash after modifying the input and output path to your needs:
    Code:
    inputfiles=$"/path/to/inputfiles/*pdf" ; outputfiles=$"/path/to/outputfiles/" fileext=".pdf" ; for i in $inputfiles ; do newfilename=$(pdftk "$i" dump_data_utf8 | awk 'c&&!--c; /InfoKey: Title/{c=1}' | awk '{ sub("InfoValue: ", ""); print}') ; if [ -z "$newfilename" ] ; then newfilename="$(basename "$i" $fileext)" ; fi ; fileiterator=0 ; while [ -f "$outputfiles$newfilename$fileext" ] ; do let fileiterator++ ; newfilename="${newfilename%%_*}"_"$fileiterator"; done ; pdftk A="$i" cat A2-end A1 output "$outputfiles$newfilename$fileext" ; done
    The files without a title value keep their filename (as a number). You could use Xpdf to read out the PDF content to plain text and extract the first line if you want. The test files I have are unfortunately not so homogeneous that the first line is always the expected title. So this could be a little tricky.
    If there are documents with equal titles I just added a number to the document. You could extract other values to get a better name.
    Finally you could call the code within an Automator task or Applescript, but that's on you. Hope this helps to accomplish the task.
     
  3. superscape macrumors 6502a

    superscape

    Joined:
    Feb 12, 2008
    Location:
    East Riding of Yorkshire, UK
    #4
  4. kiercardo thread starter macrumors newbie

    Joined:
    Jan 3, 2013
    Location:
    Rome, IT
    #5
    thank you for the tips! However all i get after running the PDFtk script you posted is a empty pdf file (not renamed). How it could be possible
     
  5. organicCPU macrumors 6502a

    organicCPU

    Joined:
    Aug 8, 2016
    #6
    That should not happen. As I explained it is possible, that the PDF cannot be renamed, but it shouldn't be empty. I have no idea why this happens. Therefore it could take a while to figure out, where exactly the problem is. As I don't know your experience with the Terminal, lets start from the beginning.
    I tested the command again from the source of my last post. Here is what I did.
    1. Copied the code from inputfiles= ...to... "$outputfiles$newfilename$fileext" ; done from the post above to an empty TextEdit document.
    2. Opened a window in Finder with the folder of input and output files
    3. Dragged the input folder from Finder to a new line into the TextEdit document, where the bash command resides.
    4. Dragged the output folder to another new line into TextEdit
    5. Replaced in TextEdit the input path (/input/path) -> inputfiles=$"/path/to/inputfiles/*pdf" (just the part written bold -> leave /*pdf intact)
    6. Replaced in TextEdit the output path (/output/path) -> outputfiles=$"/path/to/outputfiles/" (-> leave the slash (/) intact)
    7. Opened Terminal.app (There is the word bash in top of the window)
    8. Copied the whole command (see step 1) from TextEdit to the Terminal bash window and pressed enter
    That's it and it works (To be honest, I don't have pdftk in my environment variable and additionally exchanged the two occurrences of the pdftk binary with the whole path to pdftk).
    If you did the same steps 1 to 8 and still get an empty PDF, we will test the pdftk and the awk command next. If they're working like expected, we need to take a look on the PDF (is it protected?), on the system (are there special chars or whitespace in your input/output path and filename, although that should work) and on the command itself for your special use case, that we need to find out, what is so special on it.
     
  6. superscape macrumors 6502a

    superscape

    Joined:
    Feb 12, 2008
    Location:
    East Riding of Yorkshire, UK
    #7
    Hah hah! There's a blast from the past! I was the technical reviewer for the InDesign section. Apologies for any errors. ;-)
     

Share This Page