Splitting a multipage PDF

Discussion in 'macOS' started by Blackheart, Jan 19, 2006.

  1. Blackheart macrumors 6502a

    Blackheart

    Joined:
    Mar 13, 2004
    Location:
    Seattle
    #1
    For work, we need to be able to easily split a multipage PDF. For instance, if I have a 5 page PDF then I'd like to have 5 PDFs output to a folder (hopefully with recognizable names such as "page1.pdf", "page2.pdf", etc.) Manually printing to PDF every individual PDF is too time consuming.

    I've found a couple shareware programs that do this... but they cost money for something that I'd think should be fairly easy to do. I'll accept built-in OS X ideas, freeware ideas, and even command-line ideas (I can script it if I know of the tool).

    EDIT: If it helps, we have Adobe CS2 (Including Acrobat Pro 7)
     
  2. CanadaRAM macrumors G5

    CanadaRAM

    Joined:
    Oct 11, 2004
    Location:
    On the Left Coast - Victoria BC Canada
    #2
    Do you mean do this automatically as a batch operation?

    Acrobat Pro will do this, of course, but manually. I would think that Acrobat plus some clever Applescription....
     
  3. Blackheart thread starter macrumors 6502a

    Blackheart

    Joined:
    Mar 13, 2004
    Location:
    Seattle
    #3
    Definitely batch. Manually = bad beans.
     
  4. Blue Velvet Moderator emeritus

    Joined:
    Jul 4, 2004
    #4
    Hey! I was just going to say that! :D

    But there are Batch Processing tools within Acrobat Pro where you can set up your own sequence -- some time spent with these may pay some dividends. It's under 'Advanced'.
     
  5. superbovine macrumors 68030

    superbovine

    Joined:
    Nov 7, 2003
    #5
    How are you getting the original 5 page pdf it is generated someplace? That is where I would start. For example, if it is life a web based that generates a pdf or just a 5 page pdf that ppl fill out and email to you, might be easier to get them to change that end depending on what it is.
     
  6. superbovine macrumors 68030

    superbovine

    Joined:
    Nov 7, 2003
    #6
    aha!

    well I think i figured it out. I am to lazy to try it, but open automator.

    make this workflow:

    1. Get specified finder item (under finder)

    2. extract odd & even pages set extract to odd (under pdf)

    3. extract odd & even pages set extract to odd
    .
    .
    add more here...
    .

    4. print finder item.

    5. save the workflow

    there will be four permutation of this. You will need to save workflow for each, or you make a big giant one.

    1. odd odd
    2. odd even
    3. even even
    4. even odd

    I hope this works.

    If you want a big giant one, just start with step one again after step 4 in the workflow and repeat...

    if it doesn't work, post back here and i'll make it work. I really need to sleep now...usually when i sleep i come up with a better answer...

    EDIT: the above is wrong...

    bahah as soon as i hit the pillow... you said 5 pages...

    1. odd odd odd (1st page)
    2. odd even (3rd page)
    3. odd odd even (5th page)
    4. even even (2nd page)
    5. even odd (4th page)
     
  7. Blackheart thread starter macrumors 6502a

    Blackheart

    Joined:
    Mar 13, 2004
    Location:
    Seattle
    #7
    Anything a little more automatic and scalable? I mentioned a 5 page PDF just as an example. At my work, we'll need to split PDFs daily with all different amounts of pages. I just want a script that I select the file, click GO and bada-bing bada-boom, multiple PDFs.

    Out of Adobe InDesign.
     
  8. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #8
    Ghostscript should be able to do it.

    Code:
    gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=m -dLastPage=n -sOutputFile=out.pdf in.pdf 
    pdfselect from here http://www.math.uni-heidelberg.de/studinfo/gerhardt/tex/ might also help. It uses Ghostscript to pull out the individual pages. Haven't used it, but seems to meet the criteria.

    B
     
  9. superbovine macrumors 68030

    superbovine

    Joined:
    Nov 7, 2003
    #9
    whatever you can do with automator you should be able to do with applescript with I don't know anything about.
     
  10. telecomm macrumors 65816

    telecomm

    Joined:
    Nov 30, 2003
    Location:
    Rome
    #10
    I wouldn't mind finding a convenient way to do this too—I use an automator workflow to do this myself, but it's not an elegant solution, and the pages come out with weird, unsystematic names. There doesn't seem any way to get it to print the first page, then the second page, etc.

    Anyway, attached is a picture of the workflow I've been using—it makes a new folder called Page Extraction, then takes the file selected in the finder and outputs the pages into the newly created folder.

    It's really inefficient, though, so the number of occurences of "Extract..." should be tailored pretty closely to the size of the document. (I've got a bunch of these workflows for different sizes of documents—n occurences of "Extract" in a workflow will handle documents with up to 2^n pages).
     

    Attached Files:

  11. scott182 macrumors member

    Joined:
    May 23, 2004
    Location:
    Madison, WI
    #11
    xpdf includes a utility called pdfinfo that can display the number of pages in a PDF. If you can get at this information from the output, you can then use the pdfselect shell script, as suggested above, to extract each page to a different file with names pdf_file-1, pdf_file-2, etc.

    You'll need the developer tools installed (I think) to compile xpdf. Then write a script (Perl, Python, etc.) that will get the number of pages from the pdfinfo output and then run pdfselect on the file. As far as I can tell, there is no easy way to output only the number of pages using pdfinfo.

    Here is sample output from pdfinfo:

    Code:
    Scotts-Computer:~/Desktop scott$ pdfinfo Higher_Order_Messaging_OOPSLA_2005.pdf 
    Creator:        TeX
    Producer:       pdfTeX-1.20a
    CreationDate:   Sun Jul  3 22:47:04 2005
    Tagged:         no
    Pages:          12
    Encrypted:      no
    Page size:      594.99 x 841.99 pts (A4)
    File size:      284256 bytes
    Optimized:      no
    PDF version:    1.4
    
    Then the script would run pdfselect as follows (once you get the Pages: information from pdfinfo, a simple loop can be used):

    Code:
    csh pdfselect 1 2 3 4 5 6 7 8 9 10 11 12 Higher_Order_Messaging_OOPSLA_2005
    
    The one downside is that this is not terribly fast (took about 5 seconds per page on my system), but it's still probably faster than Automator.

    Let me know if this makes sense, or if you need help writing the script.
     
  12. scott182 macrumors member

    Joined:
    May 23, 2004
    Location:
    Madison, WI
    #12
    Well, I went ahead and wrote up a script to do this. The coding could be better, but it works well and rather quickly (took about 15 seconds for a 6 page PDF).

    You'll need teTeX (you'll have this if you have installed LaTeX), pdfselect, and pdfinfo (links in previous post).

    Save the following script as "splitpdf" or anything you want to call it. Be sure to change the path for the variables $pdfselect and $temp, and make sure that splitpdf is executable (chmod 755 splitpdf).

    Code:
    #!/usr/bin/perl
    
    #######################################################################
    # This script will take a PDF file as input and                       #
    #   split it into new PDF files, 1 per page.                          #
    #                                                                     #
    # Usage: ./splitpdf pdf_filename                                      #
    #                                                                     #
    # Requirements:                                                       #
    #     teTeX (typically through LaTeX)                                 #
    #     pdfselect (shell script)                                        #
    #       http://www.math.uni-heidelberg.de/studinfo/gerhardt/pdfselect #
    #     pdfinfo (packaged with xpdf)                                    #
    #######################################################################
    
    ### Change to appropriate values for your case
    ### $temp is a temporary location for the file containing pdfinfo output
    
    $pdfselect = "/Users/scott/Desktop/pdfselect";
    $temp = "/Users/scott/Desktop";
    
    $pdf_file = $ARGV[0];
    
    if($pdf_file =~ /(.+?)\.pdf/){
        $pdf_file = $1;
        } 
    
    system("pdfinfo $pdf_file.pdf > $temp/pdfinfo.txt");
    
    open(INFO, "$temp/pdfinfo.txt") || die "Error opening pdfinfo.txt $!";
    
    while(<INFO>){
        if($_ =~ /Pages:\s+?(\d+)/){
            $num_pages = $1;
            }
        }
    
    close(INFO);
    
    system("rm $temp/pdfinfo.txt");
    
    for ($i=1; $i<=$num_pages; $i++){
        $pages_string .= "$i ";
        }
    
    system("csh $pdfselect $pages_string $pdf_file"); 
    system("rm $temp/texexec-mpgraph.mp");
     
  13. superbovine macrumors 68030

    superbovine

    Joined:
    Nov 7, 2003
    #13
    nice work...
     
  14. superwoman macrumors regular

    Joined:
    Apr 25, 2005
    Location:
    Monterey,CA
    #14
    I second pdfselect. It's the right tool to do what you want, and you can easily put it in a shell-script.
     
  15. telecomm macrumors 65816

    telecomm

    Joined:
    Nov 30, 2003
    Location:
    Rome
    #15
  16. Blackheart thread starter macrumors 6502a

    Blackheart

    Joined:
    Mar 13, 2004
    Location:
    Seattle
    #16
  17. cookie1105 macrumors 6502

    Joined:
    Mar 27, 2006
    Location:
    London, UK
    #17
    Good but obscure thread. Downloaded PDFLab, it worked a treat. Helped solved my problem. Thanks for the advice.


    Cheers
     

Share This Page