My crack at the analogy
Here's my take on it, based on my work with an 8-bit PIC18F452 microcontroller (essentially a dumb computer with ram built onto the chip with lots of little I/O toys attached):
The PIC18F452 is an 8-bit computer. It is a very popular chip due to its simplicity in instructions and low cost. However, it can only deal with 8-bits of data "at a time." I say "at a time" because you can have operations that manipulate 16-bit pointers anyways. However to do this, the PIC must first operate on the lower 8 bits, and then operate on the upper 8 bits. This effectively allows the PIC to deal with variables that are 16 bits long without being a 16-bit computer (ie having a 16-bit data bus, a 16-bit instruction set etc...)
Were this same operation done by a 16-bit microcontroller, the operation could be done in one pass since the chip can look at the whole variable at the same time.
now, neglecting the part that multiple execution units could play in this, a 32-bit processor (with 32-bit registers) can only deal with 32 bits at a time, when a program needs to talk about a 64-bit variable it will have to deal with the lower 32-bits first, then the high 32-bits taking roughly 2x a long to deal with this operation. A 64-bit processor would be capable of manipulating any 64-bit value whether it be a pointer, integer, double precision float, etc... in a single swipe, thus saving cpu time.