Alright, question about processor architechtures in general

Dr.Pants · Jun 13, 2009

So, I find myself slightly confused - and let me tell the source of this confusion.

I find myself wondering why Leopard requires two codebases for Power and Intel architectures - I find myself thinking the reason being is that the Power architecture is RISC, and thus had fewer logic commands - so then I lend myself to the theory, would software for a Power processor work on an Intel processor with just a recompile? The code would have to be rewritten for the RISC processor from a CISC due to the amount of commands... But I was thinking that yes, that would be the case; however, in the "wild", this does not happen.

I find that this shows my disturbing lack of knowledge about processors in general. So, would a RISC program work on a CISC processor with a recompile? Or is it not that simple?

Tesselator · Jun 13, 2009

Dr.Pants said:
So, would a RISC program work on a CISC processor with a recompile?

It sure should. That's what high level compiled languages are all about.

On terminology:
The lines have been blurred between CISC and RISC starting some 10 years ago.

Dr.Pants · Jun 13, 2009

Tesselator said:
It sure should. That's what high level compiled languages are all about.

On terminology:
The lines have been blurred between CISC and RISC starting some 10 years ago.

Thanks - its what I figured. Was my logic (pardon the pun), though, sound? Guess that is what I am asking.

J the Ninja · Jun 13, 2009

It's not so simple as just RISC and CISC. PPC and Intel are different "instruction set architectures", meaning they use totally different commands. It's like trying to give directions in Chinese to someone who only speaks English. You will get absolutely nowhere.

The idea behind high level languages is that you can compile them into instructions for whatever processor you want, thus negating the need to significantly change a program (aka, rewrite) to make it work on another platform.

nanofrog · Jun 14, 2009

J the Ninja said:
The idea behind high level languages is that you can compile them into instructions for whatever processor you want, thus negating the need to significantly change a program (aka, rewrite) to make it work on another platform.

The high level language aspect, yes. The instruction set doesn't matter at this stage. Theoretically, it can be written once, and compiled for multiple architectures. YMMV on how well it runs though, and the OS interaction is usually the source of your PITA.

It's the compile aspect that causes the problem. This is the stage where the high level language is converted into Assembly*, or even directly to machine code. So at this step, the instruction set matters.

*Some compilers drop to Assembly, then use an Assembly compiler to actually generate the machine code. This method allows the language's front end to be recycled for multiple platforms. Only the Assembly compiler must be changed.

cmaier · Jun 14, 2009

Also keep in mind that the difference between RISC and CISC never really had much to do with the number of instructions, but really referred to a collection of philosophies. RISC architectures tended to have the following characteristics:

1) no microcode. That is, instructions exposed to the application developer are executed directly without being broken into smaller instructions. This eliminates a state machine and complexity. Note that some RISC processors (for example some SPARCs) still did/do break instructions into smaller instructions. In some cases this is a direct translation, and in other cases it involves state.

2) fewer/simpler addressing modes

3) fixed-length instructions (as opposed to, for example, x86, where various instructions can have different lengths). In some RISC architectures this meant one instruction length, but many RISC architectures now allow a few quantized sizes.

4) out-of-order issue and retire was, at first, more common among RISC architectures, though this difference is now moot.

Most CISC processors now run what would fairly be called a RISC engine, and use a complex instruction decoder to break things down into a form that the RISC engine can handle. This adds a little complexity but provides much more flexibility.

netkas · Jun 14, 2009

One of most important difference, which should be in mind when porting code from intel to ppc or otherwise - intel is little endian, ppc is big endian.
compiler will care about rest

cmaier · Jun 14, 2009

netkas said:
One of most important difference, which should be in mind when porting code from intel to ppc or otherwise - intel is little endian, ppc is big endian.
compiler will care about rest

Even worse, PPC numbers their bits backwards! So bit 0 is the most significant bit, which was a real pain in the butt when transitioning from 32- to 64-bit.

Dr.Pants · Jun 14, 2009

Thanks for all the great replies. Please bear with me, I'm a total code n00b, but I've heard the abstract concepts... I need to take a few programming classes for my degree, so I get pondering on the subjects I have to pick up. I have snipped a few quotations down on size, though, for my questions/responses...

J the Ninja said:
It's not so simple as just RISC and CISC. PPC and Intel are different "instruction set architectures", meaning they use totally different commands... The idea behind high level languages is that you can compile them into instructions for whatever processor you want, thus negating the need to significantly change a program (aka, rewrite) to make it work on another platform.

nanofrog said:
....Theoretically, it can be written once, and compiled for multiple architectures. YMMV on how well it runs though, and the OS interaction is usually the source of your PITA.

It's the compile aspect that causes the problem. This is the stage where the high level language is converted into Assembly*, or even directly to machine code. So at this step, the instruction set matters.

*Some compilers drop to Assembly, then use an Assembly compiler to actually generate the machine code. This method allows the language's front end to be recycled for multiple platforms. Only the Assembly compiler must be changed.

Ah! Making more sense now; however, one would have to write and debug the original (for both platforms, I assume), and then start optimizing separate builds for each platform To me, at least, one can only go so far with a unified codebase...

cmaier said:
Also keep in mind that the difference between RISC and CISC never really had much to do with the number of instructions, but really referred to a collection of philosophies. RISC architectures tended to have the following characteristics:

1) no microcode. That is, instructions exposed to the application developer are executed directly without being broken into smaller instructions. This eliminates a state machine and complexity. Note that some RISC processors (for example some SPARCs) still did/do break instructions into smaller instructions. In some cases this is a direct translation, and in other cases it involves state.

2) fewer/simpler addressing modes

3) fixed-length instructions (as opposed to, for example, x86, where various instructions can have different lengths)......

Most CISC processors now run what would fairly be called a RISC engine, and use a complex instruction decoder to break things down into a form that the RISC engine can handle. This adds a little complexity but provides much more flexibility.

Well, when I talked about RISC/CISC at first, I was talking about them more literally (less/more instructions). However, what specifically is the difference between a state and a direct translation?

netkas said:
One of most important difference, which should be in mind when porting code from intel to ppc or otherwise - intel is little endian, ppc is big endian.
compiler will care about rest

cmaier said:
Even worse, PPC numbers their bits backwards! So bit 0 is the most significant bit, which was a real pain in the butt when transitioning from 32- to 64-bit.

Thanks, I'll keep it in mind. I probably will need to write some science programs in a *nix platform next year - just want to have as broad of computer support as possible.

2002cbr600f4i · Jun 14, 2009

Well, as somebody else said, it's like the difference between English and Chinese... Both can be used to write a story, both can get ideas across, but if you only speak English and somebody gives you a book in Chinese, you have no idea what to do with it and vice versa.

Don't even worry about the CISC-vs-RISC stuff,because, frankly it doesn't matter. Even if you're comparing 2 RISC or 2 CISC chips, it still doesn't work...

Think of it this way:

Both PowerPC and Intel x86-64 have instructions that do things like:

add
subtract
load data into a register from memory
multiply
branch
etc.

The difference is in HOW they do it, how MANY of them there are,etc. Intel might have an instruction that takes 2 numbers from registers and stores the result directly into memory and another that takes 2 numbers from registers and stores the result into another register. PowerPC might only have the later. PowerPC might have an instruction that can divide an integer and a float, Intel might not. Even if they BOTH have instructions that do the EXACT SAME THING, they might work TOTALLY differently.

What does it mean to "execute an instruction" on a CPU? In general, when the code tells the CPU to execute an instruction, there's a table (the microcode) that looks at the name of the instruction being executed, looks it up in the table, and that table has a pattern of 0's and 1's that turns on and off different pathways in the CPU to turn on and off and select different things in the chip to make that instruction do what it's supposed to do inside the chip. Since the 2 chip designs are totally different, then the same instruction does totally different things inside the CPU.

So, you can't just take Intel assembly-language code and run it on PPC or vice versa. The instructions aren't the same, and even if they were, they don't work the same way. The number of registers to put stuff in is different, etc.

Now, with high-level languages like Objective-C or even C (well, that's semi-High level) when you compile, converts the the high level code down into CPU-specific Assembly code which the CPU runs. So it's translating from a generic language down into the specifics that your hardware understands. This is even different from CPU family to CPU family within the same brand. For instance, if you want to tweak every bit of performance out of your code that you want to run only on a G5 PPC, you'll turn on flags to tell the compiler - optimize the code you make to take advantage of G5 chip architecture and features. Likewise, the code produced for a Core 2 Duo chip isn't quite the same that the code for a Core i7 chip is. Usually what you'll do though is make the compiler generate for a whole chip family, and then if it detects a specific chip it'll kick in special routines to take advantage of those options.

So, the idea here is that if you stick with higher level languages, you just write your code and let the compiler worry about the underlying architecture. In the case of the Mac, you can tell the system to build a "unified binary" where it'll basically compiler your code twice, once for PPC, once for Intel and then package the two together. When somebody with a PPC tries to run your program, it'll detect that it's on a PPC and run the PPC code in the package, if it's on Intel, it'll run the Intel version. While this works fine, it means that you're effectively carrying around 2x as much information for no reason (since you can't use both at the same time.) That's pretty wasteful. It's one of the reason Apple is dropping PPC support in 10.6.

In reality, unless you're writing assembly language code by hand, or doing CPU design, you really don't care. It's just a processor, and it does things. That's all there is to it. You write your code, let the compiler worry about the specifics...

If you really want to learn a lot about CPUs and how they work , pick up a copy (used or new) of: "Computer Architecture, Fourth Edition: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Sep 27, 2006) by John L. Hennessy and David A. Patterson"

(Actually, even the old versions are fine.....) You can bypass a lot of the math and just focus on the concepts. This is the book we used in my Advanced CPU design class and it explained a lot to me.

cmaier · Jun 14, 2009

Dr.Pants said:
Thanks for all the great replies. Please bear with me, I'm a total code n00b, but I've heard the abstract concepts... I need to take a few programming classes for my degree, so I get pondering on the subjects I have to pick up. I have snipped a few quotations down on size, though, for my questions/responses...

Ah! Making more sense now; however, one would have to write and debug the original (for both platforms, I assume), and then start optimizing separate builds for each platform To me, at least, one can only go so far with a unified codebase...

Well, when I talked about RISC/CISC at first, I was talking about them more literally (less/more instructions). However, what specifically is the difference between a state and a direct translation?

Thanks, I'll keep it in mind. I probably will need to write some science programs in a *nix platform next year - just want to have as broad of computer support as possible.

Well, for example, I was a designer for UltraSparc V (for 3 months, before quitting and going to AMD). Some instructions could divide up into up to 9 subinstructions. The 9 instructions could have no dependency on each other, so they could all run in parallel or not - didn't matter.

Stateful instructions have to run in a particular order, and their inputs are the outputs of other instructions, so they have to wait for early instructions to finish before they could run later instructions.

Dr.Pants · Jun 14, 2009

@2002cbr600f4i

Thanks for typing all of that out; I'll have to look up that book. For some reason I was thinking that the "Universal Binaries", for instance, were optimised by hand rather then just the same batch of code compiled for each specific computer. Guess that shows the stone-age views I'm coming from. Good to know, though, that I don't have to worry too much about processors when using C

Once again, because you deserve it, thank you.

@cmaier
So, a stateful instruction is something akin to fibonacci numbers? Can't compute one number without the previous number in the expression?

cmaier · Jun 14, 2009

Dr.Pants said:
@2002cbr600f4i

Thanks for typing all of that out; I'll have to look up that book. For some reason I was thinking that the "Universal Binaries", for instance, were optimised by hand rather then just the same batch of code compiled for each specific computer. Guess that shows the stone-age views I'm coming from. Good to know, though, that I don't have to worry too much about processors when using C Once again, because you deserve it, thank you.

@cmaier
So, a stateful instruction is something akin to fibonacci numbers? Can't compute one number without the previous number in the expression?

Not quite what I meant.

Imagine the following processor instruction:

ADD X+Y+Z = R1

In imaginary microprocessor 1 (a hypothetical RISC processor, for example), there is a three-input adder. So X, Y, and Z are sent to the adder, and the result comes out the other side, all in one clock cycle (or however many clock cycles the adder takes).

In imaginary microprocessor 2, there is only a 2-input adder. So this instruction is intercepted by the instruction decoder, which breaks it into two pieces of microcode:

ADD X+Y = R
ADD R+Z = R1

A state machine needs to keep track of the interrelationship between these two instructions and make sure they issue in the right order, and that the intermediate result (R) is stored someplace temporary where it can be retrieved for the second instruction.

Most RISC processors wouldn't allow this second sort of thing. Some, however, would allow things like:

ADD X[0:8] + Y[0:8] = Z[0:8]

which would be broken into:

ADD X0+Y0=Z0
ADD X1+Y1=Z1
...

where there is no relationship between the microcode instructions.

2002cbr600f4i · Jun 14, 2009

cmaier said:
Not quite what I meant.

Imagine the following processor instruction:

ADD X+Y+Z = R1

In imaginary microprocessor 1 (a hypothetical RISC processor, for example), there is a three-input adder. So X, Y, and Z are sent to the adder, and the result comes out the other side, all in one clock cycle (or however many clock cycles the adder takes).

In imaginary microprocessor 2, there is only a 2-input adder. So this instruction is intercepted by the instruction decoder, which breaks it into two pieces of microcode:

ADD X+Y = R
ADD R+Z = R1

A state machine needs to keep track of the interrelationship between these two instructions and make sure they issue in the right order, and that the intermediate result (R) is stored someplace temporary where it can be retrieved for the second instruction.

Most RISC processors wouldn't allow this second sort of thing. Some, however, would allow things like:

ADD X[0:8] + Y[0:8] = Z[0:8]

which would be broken into:

ADD X0+Y0=Z0
ADD X1+Y1=Z1
...

where there is no relationship between the microcode instructions.

Yup, and it's important to note that on the surface, one way is not necessarily better than the other. Sure, the 2nd method might require more steps, but it might be that those steps, because they are simpler, MIGHT be able to be processed faster than the single larger 3-was adder instruction...

That book I referenced, if you get into all the math, will help you understand how to do the analysis to understand performance and determine not only HOW to optimize you CPU design (assuming you're designing a processor) but also WHICH ones to optimize and how much of a speedup you'll gain from each.

For example. maybe adding that extra 3-way instruction takes an extra 10% of silicon on the CPU, but if it's an instruction that would come up and be used a large amount of time, and the performance gains over the 2 2-way adds, then it might make sense to use that extra 10%. Then again, it might not....

But yeah, with C, about the only things you usually need to worry about is the size of things like Integers and Floats and such on your given CPU, and usually you abstract that away.

Heck, you can get even farther away from the CPU by using languages like Java that don't even compile to a real CPU architecture, but rather to a "Virtual Machine" with it's own generic "java cpu" concept that the virtual machine implements. You compile your code to "bytecode" which the Virtual machine understands and converts (on the fly usually using a "just in time compiler" into the low-level CPU-specific code. So, you write your java, it compiles to generic java bytecode, and the VM is the only thing specific to your CPU + OS.

Anyhow, CPU design and architectures can be a very VERY interesting thing to study, but in a lot of ways, unless you're getting into CPU design for a living, you really don't need to worry too much about it. Understand the basic concepts and the major things like pipelining, and superscalar and stuff like that and you'll know more than enough. You could be a very good programmer and not know jack squat about the CPU itself...

Dr.Pants · Jun 14, 2009

2002cbr600f4i said:
Understand the basic concepts and the major things like pipelining, and superscalar and stuff like that and you'll know more than enough. You could be a very good programmer and not know jack squat about the CPU itself...

Probably the most enlightening thing I've heard all day, but its something that I probably will need to know about; in the end, I will need to write scientific programs. There's quite a bit of fuss about significant figures in science which would naturally limit the numbers in an point integer - however, for example, a turbidity simulation would have a massive number of particles, which would end with a large but accurate number, IMHO. ((Sorry, this is the only thing that comes to mind, its late))

Thinking about Java, I've heard that its popular in the science field, probably for the very reason stated - It runs like a virtual machine. Probably all I would have to interact with, but at the same time... I would like to have two cents of opinion on what sort of machine to use for a specific problem due to, well, different maths.

Still, I'm going to get that book and read up. It may never pertain to me at all, but knowledge is power, isn't it?

Thanks for the education.

2002cbr600f4i · Jun 15, 2009

Hmmm.... If I'm understanding what you're saying, then it really shouldn't matter... 2 64 bit processors, even if they're totally different architectures, should give the same result back on a 64 bit integer (or 128 Bit floating point) calculation, because they both adhere to the IEEE standard.

As such, you don't really need to worry too much about the underlying processor. You just write your code...

Java doesn't run LIKE a virtual machine...it runs IN a virtual machine.

Math is math. It doesn't matter whether it's done on a MIPS processor, an Intel, an AMD or a SPARC. Some programming languages allow you to do things like imaginary numbers, others don't. Some architectures might process a 128 bit floating point calculation in 1 clock cycle, some in 2, etc. But in the end, it all comes down to a pattern of 1's and 0's that are interpreted as an integer or a float. There is the IEEE standard (sorry, I don't remember the number of it) that dictates how the processor represents the numbers.

Now if you're comparing a 16 or 32 bit processor against a 64 bit one, then ok, there's a difference... But if you're doing 64 vs 64, you're going to be fine. I would think for the sort of stuff you're talking about, things like # of registers available, amount of L1/L2/L3 cache and main memory, CPU core-->core communications pathways, and even GPGPU (OpenCL) type parallel processing issues would be more important. Now, one thing to note, (and I could be wrong here as I haven't dug into the GPGPU stuff much) is that some of the older GPU's couldn't do the same # of bit float point calculations as CPU's could. ie: 40 bit vs 64 bit. So THAT is something you might want to check if you're planning to do the GPGPU stuff.

cmaier · Jun 15, 2009

2002cbr600f4i said:
Hmmm.... If I'm understanding what you're saying, then it really shouldn't matter... 2 64 bit processors, even if they're totally different architectures, should give the same result back on a 64 bit integer (or 128 Bit floating point) calculation, because they both adhere to the IEEE standard.

As such, you don't really need to worry too much about the underlying processor. You just write your code...

Java doesn't run LIKE a virtual machine...it runs IN a virtual machine.

Math is math. It doesn't matter whether it's done on a MIPS processor, an Intel, an AMD or a SPARC. Some programming languages allow you to do things like imaginary numbers, others don't. Some architectures might process a 128 bit floating point calculation in 1 clock cycle, some in 2, etc. But in the end, it all comes down to a pattern of 1's and 0's that are interpreted as an integer or a float. There is the IEEE standard (sorry, I don't remember the number of it) that dictates how the processor represents the numbers.

Now if you're comparing a 16 or 32 bit processor against a 64 bit one, then ok, there's a difference... But if you're doing 64 vs 64, you're going to be fine. I would think for the sort of stuff you're talking about, things like # of registers available, amount of L1/L2/L3 cache and main memory, CPU core-->core communications pathways, and even GPGPU (OpenCL) type parallel processing issues would be more important. Now, one thing to note, (and I could be wrong here as I haven't dug into the GPGPU stuff much) is that some of the older GPU's couldn't do the same # of bit float point calculations as CPU's could. ie: 40 bit vs 64 bit. So THAT is something you might want to check if you're planning to do the GPGPU stuff.

Note that some architectures (cough cough x86) have FP modes that are not ANSI compliant.

2002cbr600f4i · Jun 15, 2009

Sigh... figures! LOL

Still, I think he's not really asking the right questions based on the things he's worried about...

Search

Search

Alright, question about processor architechtures in general

Dr.Pants

macrumors 65816

Tesselator

macrumors 601

Dr.Pants

macrumors 65816

J the Ninja

macrumors 68000

nanofrog

macrumors G4

cmaier

Suspended

netkas

macrumors 65816

cmaier

Suspended

Dr.Pants

macrumors 65816

2002cbr600f4i

macrumors 6502

cmaier

Suspended

Dr.Pants

macrumors 65816

cmaier

Suspended

2002cbr600f4i

macrumors 6502

Dr.Pants

macrumors 65816

2002cbr600f4i

macrumors 6502

cmaier

Suspended

2002cbr600f4i

macrumors 6502

Our Staff