Splitting a multipage PDF

Blackheart · Jan 19, 2006

For work, we need to be able to easily split a multipage PDF. For instance, if I have a 5 page PDF then I'd like to have 5 PDFs output to a folder (hopefully with recognizable names such as "page1.pdf", "page2.pdf", etc.) Manually printing to PDF every individual PDF is too time consuming.

I've found a couple shareware programs that do this... but they cost money for something that I'd think should be fairly easy to do. I'll accept built-in OS X ideas, freeware ideas, and even command-line ideas (I can script it if I know of the tool).

EDIT: If it helps, we have Adobe CS2 (Including Acrobat Pro 7)

CanadaRAM · Jan 19, 2006

Do you mean do this automatically as a batch operation?

Acrobat Pro will do this, of course, but manually. I would think that Acrobat plus some clever Applescription....

Blackheart · Jan 19, 2006

CanadaRAM said:
Do you mean do this automatically as a batch operation?

Acrobat Pro will do this, of course, but manually.

Definitely batch. Manually = bad beans.

Blue Velvet · Jan 19, 2006

CanadaRAM said:
Acrobat Pro will do this, of course, but manually. I would think that Acrobat plus some clever Applescription....

Hey! I was just going to say that!

But there are Batch Processing tools within Acrobat Pro where you can set up your own sequence -- some time spent with these may pay some dividends. It's under 'Advanced'.

superbovine · Jan 20, 2006

How are you getting the original 5 page pdf it is generated someplace? That is where I would start. For example, if it is life a web based that generates a pdf or just a 5 page pdf that ppl fill out and email to you, might be easier to get them to change that end depending on what it is.

superbovine · Jan 20, 2006

aha!

well I think i figured it out. I am to lazy to try it, but open automator.

make this workflow:

1. Get specified finder item (under finder)

2. extract odd & even pages set extract to odd (under pdf)

3. extract odd & even pages set extract to odd
.
.
add more here...
.
4. print finder item.

5. save the workflow

there will be four permutation of this. You will need to save workflow for each, or you make a big giant one.

1. odd odd
2. odd even
3. even even
4. even odd

I hope this works.

If you want a big giant one, just start with step one again after step 4 in the workflow and repeat...

if it doesn't work, post back here and i'll make it work. I really need to sleep now...usually when i sleep i come up with a better answer...

EDIT: the above is wrong...

bahah as soon as i hit the pillow... you said 5 pages...

1. odd odd odd (1st page)
2. odd even (3rd page)
3. odd odd even (5th page)
4. even even (2nd page)
5. even odd (4th page)

Blackheart · Jan 20, 2006

Anything a little more automatic and scalable? I mentioned a 5 page PDF just as an example. At my work, we'll need to split PDFs daily with all different amounts of pages. I just want a script that I select the file, click GO and bada-bing bada-boom, multiple PDFs.

superbovine said:
How are you getting the original 5 page pdf it is generated someplace? That is where I would start. For example, if it is life a web based that generates a pdf or just a 5 page pdf that ppl fill out and email to you, might be easier to get them to change that end depending on what it is.

Out of Adobe InDesign.

balamw · Jan 20, 2006

Ghostscript should be able to do it.

Code:

gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=m -dLastPage=n -sOutputFile=out.pdf in.pdf

pdfselect from here http://www.math.uni-heidelberg.de/studinfo/gerhardt/tex/ might also help. It uses Ghostscript to pull out the individual pages. Haven't used it, but seems to meet the criteria.

B

superbovine · Jan 20, 2006

Blackheart said:
Anything a little more automatic and scalable? I mentioned a 5 page PDF just as an example. At my work, we'll need to split PDFs daily with all different amounts of pages. I just want a script that I select the file, click GO and bada-bing bada-boom, multiple PDFs.

Out of Adobe InDesign.

whatever you can do with automator you should be able to do with applescript with I don't know anything about.

telecomm · Jan 20, 2006

I wouldn't mind finding a convenient way to do this too—I use an automator workflow to do this myself, but it's not an elegant solution, and the pages come out with weird, unsystematic names. There doesn't seem any way to get it to print the first page, then the second page, etc.

Anyway, attached is a picture of the workflow I've been using—it makes a new folder called Page Extraction, then takes the file selected in the finder and outputs the pages into the newly created folder.

It's really inefficient, though, so the number of occurences of "Extract..." should be tailored pretty closely to the size of the document. (I've got a bunch of these workflows for different sizes of documents—n occurences of "Extract" in a workflow will handle documents with up to 2^n pages).

scott182 · Jan 20, 2006

xpdf includes a utility called pdfinfo that can display the number of pages in a PDF. If you can get at this information from the output, you can then use the pdfselect shell script, as suggested above, to extract each page to a different file with names pdf_file-1, pdf_file-2, etc.

You'll need the developer tools installed (I think) to compile xpdf. Then write a script (Perl, Python, etc.) that will get the number of pages from the pdfinfo output and then run pdfselect on the file. As far as I can tell, there is no easy way to output only the number of pages using pdfinfo.

Here is sample output from pdfinfo:

Code:

Scotts-Computer:~/Desktop scott$ pdfinfo Higher_Order_Messaging_OOPSLA_2005.pdf 
Creator:        TeX
Producer:       pdfTeX-1.20a
CreationDate:   Sun Jul  3 22:47:04 2005
Tagged:         no
Pages:          12
Encrypted:      no
Page size:      594.99 x 841.99 pts (A4)
File size:      284256 bytes
Optimized:      no
PDF version:    1.4

Then the script would run pdfselect as follows (once you get the Pages: information from pdfinfo, a simple loop can be used):

Code:

csh pdfselect 1 2 3 4 5 6 7 8 9 10 11 12 Higher_Order_Messaging_OOPSLA_2005

The one downside is that this is not terribly fast (took about 5 seconds per page on my system), but it's still probably faster than Automator.

Let me know if this makes sense, or if you need help writing the script.

scott182 · Jan 20, 2006

Well, I went ahead and wrote up a script to do this. The coding could be better, but it works well and rather quickly (took about 15 seconds for a 6 page PDF).

You'll need teTeX (you'll have this if you have installed LaTeX), pdfselect, and pdfinfo (links in previous post).

Save the following script as "splitpdf" or anything you want to call it. Be sure to change the path for the variables $pdfselect and $temp, and make sure that splitpdf is executable (chmod 755 splitpdf).

Code:

#!/usr/bin/perl

#######################################################################
# This script will take a PDF file as input and                       #
#   split it into new PDF files, 1 per page.                          #
#                                                                     #
# Usage: ./splitpdf pdf_filename                                      #
#                                                                     #
# Requirements:                                                       #
#     teTeX (typically through LaTeX)                                 #
#     pdfselect (shell script)                                        #
#       http://www.math.uni-heidelberg.de/studinfo/gerhardt/pdfselect #
#     pdfinfo (packaged with xpdf)                                    #
#######################################################################

### Change to appropriate values for your case
### $temp is a temporary location for the file containing pdfinfo output

$pdfselect = "/Users/scott/Desktop/pdfselect";
$temp = "/Users/scott/Desktop";

$pdf_file = $ARGV[0];

if($pdf_file =~ /(.+?)\.pdf/){
    $pdf_file = $1;
    } 

system("pdfinfo $pdf_file.pdf > $temp/pdfinfo.txt");

open(INFO, "$temp/pdfinfo.txt") || die "Error opening pdfinfo.txt $!";

while(<INFO>){
    if($_ =~ /Pages:\s+?(\d+)/){
        $num_pages = $1;
        }
    }

close(INFO);

system("rm $temp/pdfinfo.txt");

for ($i=1; $i<=$num_pages; $i++){
    $pages_string .= "$i ";
    }

system("csh $pdfselect $pages_string $pdf_file"); 
system("rm $temp/texexec-mpgraph.mp");

superbovine · Jan 20, 2006

scott182 said:
Well, I went ahead and wrote up a script to do this. The coding could be better, but it works well and rather quickly (took about 15 seconds for a 6 page PDF).

You'll need teTeX (you'll have this if you have installed LaTeX), pdfselect, and pdfinfo (links in previous post).

Save the following script as "splitpdf" or anything you want to call it. Be sure to change the path for the variables $pdfselect and $temp, and make sure that splitpdf is executable (chmod 755 splitpdf).

nice work...

superwoman · Jan 22, 2006

balamw said:
Ghostscript should be able to do it.

Code:

gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=m -dLastPage=n -sOutputFile=out.pdf in.pdf

pdfselect from here http://www.math.uni-heidelberg.de/studinfo/gerhardt/tex/ might also help. It uses Ghostscript to pull out the individual pages. Haven't used it, but seems to meet the criteria.

B

I second pdfselect. It's the right tool to do what you want, and you can easily put it in a shell-script.

telecomm · Mar 7, 2006

If you're still looking for solutions, this looks like it might be just what you need (and it's freeware).

http://www.versiontracker.com/dyn/moreinfo/macosx/24482

Blackheart · Mar 7, 2006

telecomm said:
If you're still looking for solutions, this looks like it might be just what you need (and it's freeware).

http://www.versiontracker.com/dyn/moreinfo/macosx/24482

Good find. Thanks!

cookie1105 · Aug 28, 2006

Good but obscure thread. Downloaded PDFLab, it worked a treat. Helped solved my problem. Thanks for the advice.

Cheers

Search

Search

Splitting a multipage PDF

Blackheart

macrumors 6502a

CanadaRAM

macrumors G5

Blackheart

macrumors 6502a

Blue Velvet

Moderator emeritus

superbovine

macrumors 68030

superbovine

macrumors 68030

Blackheart

macrumors 6502a

balamw

Moderator emeritus

superbovine

macrumors 68030

telecomm

macrumors 65816

Attachments

scott182

macrumors member

scott182

macrumors member

superbovine

macrumors 68030

superwoman

macrumors regular

telecomm

macrumors 65816

Blackheart

macrumors 6502a

cookie1105

macrumors 6502

Our Staff