|Apr 6, 2012, 07:43 PM||#1|
PDF command line tools
I wrote three simple, single-purpose command line utilities to manipulate PDF documents:
pdfcat - concatenate (join, fuse) PDF documents
pdfcrop - crop (adjust margins of) PDF documents
pdfsplit - split (extract pages from) PDF documents
They can be downloaded here. Each comes with its own man page. There also is a small shell script to move the tools and man pages to the appropriate directories, but this is not mandatory. Read the Readme file.
The idea is to have something lightweight (the tar ball is less than 60 kB in size) and panic-proof that works immediately after download, without any dependencies, compilation, installation, setup or learning curve. The two tools (pdfcrop and pdfsplit) which take arguments beyond the obvious file names understand plain English, and the order of the arguments is optimized for interactive use: the file to crop or split, the crop size or page ranges, and redirection of the standard output to a new file. For basic use, there are no options to memorize.
The tools work just as well in scripts, of course, and each can be used as a filter in a pipe.
For those less familiar with the Terminal, these are the keys one uses to navigate man pages: space bar, d, e (down one page, half a page, one line), b, u, y (up one page, half a page, one line), g, G (start, end), and /, n (search, next), q (quit). Surprisingly efficient. Factored differently: space bar, b (down, up one page), d, u (down, up half a page), e, y (down, up one line).
To quickly check the result of your command line PDF manipulations in Preview, use OS X's "open" command: $ open file.pdf (the dollar sign is the bash prompt).
As a MacRumors member it seemed natural to post here first, but if you know other places where people would find the PDF tools, please come forward. Also, any comments, questions, ideas, complaints or other feedback you might have will be greatly appreciated; use the e-mail address at the bottom of the man pages and Readme file, or post here to have your voice heard by (and tap the wisdom of) thousands of people instead of just me.
|Apr 7, 2012, 08:02 AM||#3|
Thanks, I know that. The commands are meant for when you are already busy in Terminal, for when there are many documents or documents with many pages to process, and for scripts. I find them convenient.
P.S.: I forgot in my original post:
HELP *** Testers needed !!! *** HELP
If you are running Leopard, Snow Leopard, Lion, or Mountain Lion, especially on 64-bit Intel hardware, have 5 minutes to spare and an idle PDF lying around, it would be great if you could try at least one of the tools and report back that it works.
|Apr 7, 2012, 09:24 PM||#5|
Thank you for asking, superwoman. pdftk is a great package and far more comprehensive, except for cropping which I think it can't do. I would say that it plays in another league. Here is a comparison:
pdftk is multi-platform and comes with its own PDF library.
These tools are OS X only and use the system PDF library (the same as Preview).
pdftk is a 16 MB download.
These tools are a 55 kB (!) download.
pdftk must be compiled (requiring the developer tools, and time) or installed, and installs its libraries.
These tools work out of the box, on Intel & PowerPC, and don't install or modify anything.
pdftk is rather strict with regard to option syntax.
These tools are extremely lenient and parse plain English as well as various option idioms.
pdftk rolls concatenation and splitting into one with its smart "cat" operator.
Theses tools make a clean distinction between the two, resulting in simpler syntax at the expense of flexibility.
pdftk can't crop. (Please correct me if I am wrong.)
These tools can, and have all the units and paper sizes I could find on Wikipedia built-in.
pdftk can perform sophisticated operations (rotate pages, fill in forms, add watermarks, encrypt documents, ...).
These tools can't; they do just the basics (split, combine, and crop).
When I needed this kind of basic PDF functionality three years ago, I found pdftk at once overkill and lacking, discovered the neat OS X Cocoa APIs, and wrote these tools, purely for myself. Since they turned out well, I thought it a shame to have them sit idle on my hard disk while they could be helping others, so I recently took the time to polish them up, write the man pages, and package everything for distribution.
Some of the above points are totally moot on today's hardware, e.g., download sizes or load times and memory consumption of libraries, but I believe my emphasis on convenience and ease of use has merit; people tend to go looking for tools like these when they are operating in panic mode, with a deadline looming, and 1000 other things to do they hadn't thought about. In such cases, unless both the dev tools and the right package manager are already installed, compilation from source is not an option. The pdftk home page does offer pre-compiled builds for Snow Leopard and Panther, though, but I have no experience with them.
To summarize: these tools are OS X native and "just work", instantly, have a straightforward syntax, but are limited to basic operations.
|Mar 4, 2013, 10:07 AM||#6|
Appropriate for specific uses
Obviously, I'd use something like pdftk for situations where that much flexibility and features is needed, but these work great for what I need.
|Mar 8, 2013, 02:34 PM||#7|
Following a user request, I added a command line tool to burst multipage PDF documents into single pages. From the man page:
NAME pdfburst -- burst (split) PDF documents into single pages SYNOPSIS pdfburst file [path] DESCRIPTION The pdfburst utility bursts (splits) the PDF document file into single pages which it writes to path, appended by an underscore character and zero-padded page numbers. If file is a single dash (-), the PDF document is read from the standard input. If path is omitted, the base name (last path component) of file is used and the single page files are created in the current working directory. If path ends with a slash (/), it designates a directory and the single page files are named with just the page number. Missing directories along path are created.
|Jun 15, 2013, 09:02 AM||#8|
Good work! Unfortunately, 10.8.4 broke them :/
Thanks for the work you did in releasing these tools! They made quick work of some daily PDF processing jobs I use. However, it seems that the recently released 10.8.4 update changed the core libraries and now PDFSPLIT fails with:
2013-06-15 08:51:20.596 pdfsplit[21234:707] -[__NSCFNumber annotations]: unrecognized selector sent to instance 0x144410
2013-06-15 08:51:20.651 pdfsplit[21234:707] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSCFNumber annotations]: unrecognized selector sent to instance 0x144410'
*** Call stack at first throw:
0 CoreFoundation 0x95851e8b __raiseError + 219
1 libobjc.A.dylib 0x944ac52e objc_exception_throw + 230
2 CoreFoundation 0x95855afd -[NSObject(NSObject) doesNotRecognizeSelector:] + 253
3 CoreFoundation 0x9579de87 ___forwarding___ + 487
4 CoreFoundation 0x9579dc32 _CF_forwarding_prep_0 + 50
5 PDFKit 0x90207227 -[PDFDocument removePageAtIndex:] + 360
6 pdfsplit 0x0000366f pdfsplit + 9839
7 pdfsplit 0x00001d9a pdfsplit + 3482
8 pdfsplit 0x00001cc1 pdfsplit + 3265
Trace/BPT trap: 5
I hope an update would be quick and easy. Again, thanks for everything!
|command line, pdf, tool, utility|
|Thread Tools||Search this Thread|
All times are GMT -5. The time now is 12:03 AM.