Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Blackheart

macrumors 6502a
Original poster
Mar 13, 2004
938
0
Seattle
For work, we need to be able to easily split a multipage PDF. For instance, if I have a 5 page PDF then I'd like to have 5 PDFs output to a folder (hopefully with recognizable names such as "page1.pdf", "page2.pdf", etc.) Manually printing to PDF every individual PDF is too time consuming.

I've found a couple shareware programs that do this... but they cost money for something that I'd think should be fairly easy to do. I'll accept built-in OS X ideas, freeware ideas, and even command-line ideas (I can script it if I know of the tool).

EDIT: If it helps, we have Adobe CS2 (Including Acrobat Pro 7)
 
CanadaRAM said:
Acrobat Pro will do this, of course, but manually. I would think that Acrobat plus some clever Applescription....

Hey! I was just going to say that! :D

But there are Batch Processing tools within Acrobat Pro where you can set up your own sequence -- some time spent with these may pay some dividends. It's under 'Advanced'.
 
How are you getting the original 5 page pdf it is generated someplace? That is where I would start. For example, if it is life a web based that generates a pdf or just a 5 page pdf that ppl fill out and email to you, might be easier to get them to change that end depending on what it is.
 
aha!

well I think i figured it out. I am to lazy to try it, but open automator.

make this workflow:

1. Get specified finder item (under finder)

2. extract odd & even pages set extract to odd (under pdf)

3. extract odd & even pages set extract to odd
.
.
add more here...
.

4. print finder item.

5. save the workflow

there will be four permutation of this. You will need to save workflow for each, or you make a big giant one.

1. odd odd
2. odd even
3. even even
4. even odd

I hope this works.

If you want a big giant one, just start with step one again after step 4 in the workflow and repeat...

if it doesn't work, post back here and i'll make it work. I really need to sleep now...usually when i sleep i come up with a better answer...

EDIT: the above is wrong...

bahah as soon as i hit the pillow... you said 5 pages...

1. odd odd odd (1st page)
2. odd even (3rd page)
3. odd odd even (5th page)
4. even even (2nd page)
5. even odd (4th page)
 
Anything a little more automatic and scalable? I mentioned a 5 page PDF just as an example. At my work, we'll need to split PDFs daily with all different amounts of pages. I just want a script that I select the file, click GO and bada-bing bada-boom, multiple PDFs.

superbovine said:
How are you getting the original 5 page pdf it is generated someplace? That is where I would start. For example, if it is life a web based that generates a pdf or just a 5 page pdf that ppl fill out and email to you, might be easier to get them to change that end depending on what it is.

Out of Adobe InDesign.
 
Blackheart said:
Anything a little more automatic and scalable? I mentioned a 5 page PDF just as an example. At my work, we'll need to split PDFs daily with all different amounts of pages. I just want a script that I select the file, click GO and bada-bing bada-boom, multiple PDFs.



Out of Adobe InDesign.

whatever you can do with automator you should be able to do with applescript with I don't know anything about.
 
I wouldn't mind finding a convenient way to do this too—I use an automator workflow to do this myself, but it's not an elegant solution, and the pages come out with weird, unsystematic names. There doesn't seem any way to get it to print the first page, then the second page, etc.

Anyway, attached is a picture of the workflow I've been using—it makes a new folder called Page Extraction, then takes the file selected in the finder and outputs the pages into the newly created folder.

It's really inefficient, though, so the number of occurences of "Extract..." should be tailored pretty closely to the size of the document. (I've got a bunch of these workflows for different sizes of documents—n occurences of "Extract" in a workflow will handle documents with up to 2^n pages).
 

Attachments

  • Picture 1.png
    Picture 1.png
    41.8 KB · Views: 145
  • Picture 2.png
    Picture 2.png
    47.4 KB · Views: 157
xpdf includes a utility called pdfinfo that can display the number of pages in a PDF. If you can get at this information from the output, you can then use the pdfselect shell script, as suggested above, to extract each page to a different file with names pdf_file-1, pdf_file-2, etc.

You'll need the developer tools installed (I think) to compile xpdf. Then write a script (Perl, Python, etc.) that will get the number of pages from the pdfinfo output and then run pdfselect on the file. As far as I can tell, there is no easy way to output only the number of pages using pdfinfo.

Here is sample output from pdfinfo:

Code:
Scotts-Computer:~/Desktop scott$ pdfinfo Higher_Order_Messaging_OOPSLA_2005.pdf 
Creator:        TeX
Producer:       pdfTeX-1.20a
CreationDate:   Sun Jul  3 22:47:04 2005
Tagged:         no
Pages:          12
Encrypted:      no
Page size:      594.99 x 841.99 pts (A4)
File size:      284256 bytes
Optimized:      no
PDF version:    1.4

Then the script would run pdfselect as follows (once you get the Pages: information from pdfinfo, a simple loop can be used):

Code:
csh pdfselect 1 2 3 4 5 6 7 8 9 10 11 12 Higher_Order_Messaging_OOPSLA_2005

The one downside is that this is not terribly fast (took about 5 seconds per page on my system), but it's still probably faster than Automator.

Let me know if this makes sense, or if you need help writing the script.
 
Well, I went ahead and wrote up a script to do this. The coding could be better, but it works well and rather quickly (took about 15 seconds for a 6 page PDF).

You'll need teTeX (you'll have this if you have installed LaTeX), pdfselect, and pdfinfo (links in previous post).

Save the following script as "splitpdf" or anything you want to call it. Be sure to change the path for the variables $pdfselect and $temp, and make sure that splitpdf is executable (chmod 755 splitpdf).

Code:
#!/usr/bin/perl

#######################################################################
# This script will take a PDF file as input and                       #
#   split it into new PDF files, 1 per page.                          #
#                                                                     #
# Usage: ./splitpdf pdf_filename                                      #
#                                                                     #
# Requirements:                                                       #
#     teTeX (typically through LaTeX)                                 #
#     pdfselect (shell script)                                        #
#       http://www.math.uni-heidelberg.de/studinfo/gerhardt/pdfselect #
#     pdfinfo (packaged with xpdf)                                    #
#######################################################################

### Change to appropriate values for your case
### $temp is a temporary location for the file containing pdfinfo output

$pdfselect = "/Users/scott/Desktop/pdfselect";
$temp = "/Users/scott/Desktop";

$pdf_file = $ARGV[0];

if($pdf_file =~ /(.+?)\.pdf/){
    $pdf_file = $1;
    } 

system("pdfinfo $pdf_file.pdf > $temp/pdfinfo.txt");

open(INFO, "$temp/pdfinfo.txt") || die "Error opening pdfinfo.txt $!";

while(<INFO>){
    if($_ =~ /Pages:\s+?(\d+)/){
        $num_pages = $1;
        }
    }

close(INFO);

system("rm $temp/pdfinfo.txt");

for ($i=1; $i<=$num_pages; $i++){
    $pages_string .= "$i ";
    }

system("csh $pdfselect $pages_string $pdf_file"); 
system("rm $temp/texexec-mpgraph.mp");
 
scott182 said:
Well, I went ahead and wrote up a script to do this. The coding could be better, but it works well and rather quickly (took about 15 seconds for a 6 page PDF).

You'll need teTeX (you'll have this if you have installed LaTeX), pdfselect, and pdfinfo (links in previous post).

Save the following script as "splitpdf" or anything you want to call it. Be sure to change the path for the variables $pdfselect and $temp, and make sure that splitpdf is executable (chmod 755 splitpdf).

nice work...
 
balamw said:
Ghostscript should be able to do it.

Code:
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=m -dLastPage=n -sOutputFile=out.pdf in.pdf

pdfselect from here http://www.math.uni-heidelberg.de/studinfo/gerhardt/tex/ might also help. It uses Ghostscript to pull out the individual pages. Haven't used it, but seems to meet the criteria.

B

I second pdfselect. It's the right tool to do what you want, and you can easily put it in a shell-script.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.