Not quite what I meant.
Imagine the following processor instruction:
ADD X+Y+Z = R1
In imaginary microprocessor 1 (a hypothetical RISC processor, for example), there is a three-input adder. So X, Y, and Z are sent to the adder, and the result comes out the other side, all in one clock cycle (or however many clock cycles the adder takes).
In imaginary microprocessor 2, there is only a 2-input adder. So this instruction is intercepted by the instruction decoder, which breaks it into two pieces of microcode:
ADD X+Y = R
ADD R+Z = R1
A state machine needs to keep track of the interrelationship between these two instructions and make sure they issue in the right order, and that the intermediate result (R) is stored someplace temporary where it can be retrieved for the second instruction.
Most RISC processors wouldn't allow this second sort of thing. Some, however, would allow things like:
ADD X[0:8] + Y[0:8] = Z[0:8]
which would be broken into:
ADD X0+Y0=Z0
ADD X1+Y1=Z1
...
where there is no relationship between the microcode instructions.
Yup, and it's important to note that on the surface, one way is not necessarily better than the other. Sure, the 2nd method might require more steps, but it might be that those steps, because they are simpler, MIGHT be able to be processed faster than the single larger 3-was adder instruction...
That book I referenced, if you get into all the math, will help you understand how to do the analysis to understand performance and determine not only HOW to optimize you CPU design (assuming you're designing a processor) but also WHICH ones to optimize and how much of a speedup you'll gain from each.
For example. maybe adding that extra 3-way instruction takes an extra 10% of silicon on the CPU, but if it's an instruction that would come up and be used a large amount of time, and the performance gains over the 2 2-way adds, then it might make sense to use that extra 10%. Then again, it might not....
But yeah, with C, about the only things you usually need to worry about is the size of things like Integers and Floats and such on your given CPU, and usually you abstract that away.
Heck, you can get even farther away from the CPU by using languages like Java that don't even compile to a real CPU architecture, but rather to a "Virtual Machine" with it's own generic "java cpu" concept that the virtual machine implements. You compile your code to "bytecode" which the Virtual machine understands and converts (on the fly usually using a "just in time compiler" into the low-level CPU-specific code. So, you write your java, it compiles to generic java bytecode, and the VM is the only thing specific to your CPU + OS.
Anyhow, CPU design and architectures can be a very VERY interesting thing to study, but in a lot of ways, unless you're getting into CPU design for a living, you really don't need to worry too much about it. Understand the basic concepts and the major things like pipelining, and superscalar and stuff like that and you'll know more than enough. You could be a very good programmer and not know jack squat about the CPU itself...