See? I told you that I didn’t know him. Hello Jack, glad to meet you!
See? I told you that I didn’t know him. Hello Jack, glad to meet you!
Then what is your point?
SPARC is RISC. Isn't RISC all that matters?
My understanding was that CISC was chosen because its faster, it can run multiple operations at one time while the RISC could run just 1 operation at one time. RISC was used mainly on devices that were more of an appliance mainly due to their low power usage. I mean, even on the *nix side of things they mainly use AMD and Intel which are CISC, no Windows needed there. It doesn't help that Apple abandoned IBM POWER PC(RISC) and switched architectures to Intel(CISC) just because PowerPC was not delivering as much performance.
Whaaa??? You mean there were other factors? I wish I had known: That there are other factors at play other than RISC versus CISC.
If only I had known we could have avoided all of this. Oh, wait.
So, IBM was only interested in high-performance variants that consumed more and more power? Really? You mean they wouldn't be interested in developing higher-performance processors that used equal or less power? They were solely focused on higher-performance at more and more power?
Motorola lost interest in capturing a market with higher-performing processor designs? Seems like a very poor business decision to ignore such a huge market.
Which is why Apple created its monster architecture with 8 decoders and a giant reorder buffer and unlike most ARM it uses micro-ops; just not in the way x86 does?Risc can absolutely run more than one operation at a time. In fact, it was RISC processors that pioneered superscalar functionality. In fact, it’s far easier to do Parallelism with RISC - that’s something addressed in the article that is linked to in the original post in this thread. This is because there’s a 1:1 mapping of micro ops to ISA instructions in RISC, unlike in CISC, and RISC instructions are almost always decoded in a single pipe stage. So when you create your reorder buffer/reservation stations/superscalar mapper it’s far easier to keep track of things so that when a branch prediction is missed and you need to unwind, or a cache miss occurs, or whatever, you don’t have to figure out which set of micro ops correspond to a single ISA instruction. It makes life easier in a lot of ways, and also means it’s much faster to get instructions *into* the reorder buffer/reservation stations. In CISC you can be limited by the bandwidth getting data into the buffer. In RISC you are more typically limited by the time it takes to identify dependencies (i.e. that one instruction is dependent on the result of another, so they have to issue in a certain order).
The Mac is not having a hard time surviving. Apple sold $9 billion worth in the last quarter alone. Does that sound like a "struggling" product? The Mac user base is currently at 140 million and continues to grow as half of Macs sold today are to people new to the Mac.
The fact is, Apple only makes a handful of computers in a few form factors at the higher end of the market. This naturally limits broader appeal.
Parallelism isn’t a risc/CISC issue.
Given this, any guesses how Apple could scale this design into a modular/upgradeable Mac Pro?It actually is a system on chip. The term is used to refer to a design methodology where blocks are independently designed and communicate on a bus rather than via specialized connections. And each of the circuits mentioned (cpu, gpu, neural engines, secure store, memory controller, etc) are indeed on the same silicon chip. The only thing that’s on a separate chip in the package is the RAM.
SOC is used to differentiate from “ASIC,” which is a methodology where each of those blocks would have its own unique interface, and changes made to one block would have effects that affect the others.
Micro-ops are not the same as microcode, which is, I think, confusing people. Micro ops just is the internal representation of the instruction. The op code, for example, may be broken into multiple additional signals so that downstream steps don’t need to go to the trouble of doing it. Think of micro ops, essentially, as just a different version of the instruction. X86, by contrast, often maps a single instruction into a large set of microcode instructions that need to be performed in a particular sequence. And to figure that out, the instruction stream has to keep guessing as to whether a particular set of bits is the start of one instruction or a different instruction. So what we often do is proceed as if it’s BOTH, and then throw away the work we did for the other. And since instructions may not be aligned to memory lines, you have to keep fetching things and then glue things together, which is part of the reason you need a state machine (am I looking for the start of an instruction? am i looking to see if i need to keep fetching parts of the instruction? am i looking for the end of an instruction?).Which is why Apple created its monster architecture with 8 decoders and a giant reorder buffer and unlike most ARM it uses micro-ops; just not in the way x86 does?
I didn't see ARM listed on his resume.ARM seems to be doing well.
Two of which are dead, one of which is nearly dead, and the final one which (according to him) is a poor design. It doesn't appear to me that this is something to brag about.I don’t know cmaier from jack but it’s obvious that he has designed chips and worked with both x86 and RISC designs, something other posters here on MacRumors have verified. What are your credentials?
So IBMs goal was to create processors that consumed more and more power? Seems like an odd decision. Oh, I've worked for a number of different companies. Having been an employee of these companies doesn't make me an authority on their strategy.I worked at IBM and yes - that is an accurate statement. IBM did not want to play in the consumer desktop/laptop market at that time. They were more interested in supporting data centers and the corporate server market. What apple wanted to do with the processors supporting their products was vastly different than what IBM/Motorola had in mind.
You are correct. I fubared on that one. I read your subsequent post responding to another poster and realized I was incorrect.
I always thought RISC just crunched through single instructions at a really fast rate - can you describe how a RISC chip (ARM) does run instructions in parallel. I would find that interesting.
If there is a link to something that would save you time - that would work as well.
No - you are being obtuse - a business decision was made to focus on a market segment where power efficiency was not one the leading feature. When you are on the business side of the organization and are in meetings where such discussions and decisions are made - you can comment on this.So IBMs goal was to create processors that consumed more and more power? Seems like an odd decision. Oh, I've worked for a number of different companies. Having been an employee of these companies doesn't make me an authority on their strategy.
So the article is incorrect in citing the use of micro ops by Apple Silicon? I did get the general point of RISC micro ops being more uniform so that Apple with the WIDE pipe and huge caches and buffers is simply processing a lot more instructions at one time than the others.Micro-ops are not the same as microcode, which is, I think, confusing people. Micro ops just is the internal representation of the instruction. The op code, for example, may be broken into multiple additional signals so that downstream steps don’t need to go to the trouble of doing it. Think of micro ops, essentially, as just a different version of the instruction. X86, by contrast, often maps a single instruction into a large set of microcode instructions that need to be performed in a particular sequence. And to figure that out, the instruction stream has to keep guessing as to whether a particular set of bits is the start of one instruction or a different instruction. So what we often do is proceed as if it’s BOTH, and then throw away the work we did for the other. And since instructions may not be aligned to memory lines, you have to keep fetching things and then glue things together, which is part of the reason you need a state machine (am I looking for the start of an instruction? am i looking to see if i need to keep fetching parts of the instruction? am i looking for the end of an instruction?).
Given this, any guesses how Apple could scale this design into a modular/upgradeable Mac Pro?
As I'm sure you know, Apple publicly recognized that, to serve the pro market, they need their pro desktop offering to be modular, resulting in the last Mac Pro. One of the big complaints among pros about the previous trashcan model is that GPU design progressed rapidly, and that model didn't allow pros to upgrade their GPU's to keep up.
But a key to Apple's new design seems to be extensive CPU-GPU integration, thus I don't see how they could allow for GPU-only upgrades without giving that up. Perhaps Apple's answer to pros' modularity requirements would be to allow the addition of new CPU-GPU SoC modules (so you couldn't buy just more GPU power alone). Those modules would then have to be linked to each other. The inter-module communcation speed would be slower than intra-module, thus there would need to be sophisticated traffic control to distribute computational load in a way that minimizes total processing time (i.e., keep as much as possible within each module, passing between modules only as necessary)—just as is, I assume, currently done for different nodes in supercomputers.
I suppose they might be able to offer supplementary CPU-GPU modules that are especially GPU-heavy.
One thing I don't know is whether the sorts of multi-threaded applications used by Pros can distribute their threads across multiple separate processors, or only across multiple cores in the same processor.
Or perhaps they will pursue a hybrid approach for the pros -- a fully integrated central CPU-GPU chip, and then separate GPU-only modules that, while they wouldn't offer the CPU-GPU integration benefits of the the central chip, would provide significant added GPU processing power.
SoC design? I wish I would have figured that out: but rather Apple placing everything in one package including specialized processing unitsIf you read the source article no - its the implementation approach of RISC - ARM and SOC designs that made the difference. RISC just offers more flexibility. Now that clock speeds cannot be ramped up at will -- the weaknesses in the CISC architecture are becoming more apparent.
It’s actually pretty simple. Each core has multiple ALUs, each of which has an adder, shifter, multiplier (depending on the microarchitecture), etc.
So if the instruction stream looks like this:
ADD R0, R1 -> R2
ADD R3, R4 -> R5
ADD R2, R5 -> R6
Then you build a little scoreboard. The first instruction depends on R0 and R1 and creates R2.
The next instruction does not depend on R2, so it can be issued at the same time as the first instruction.
The last instruction depends on R2 and R5. Since instructions that came before the last instruction change the values of both of these, and since they have not been issued into the pipeline yet, we cannot issue the last instruction until the first two issue. In fact, we can’t probably issue them until they have mostly completed (you don’t actually wait until the first two instructions modify R2 and R5 - it’s good enough that they calculated the results, and then you can short circuit them into the appropriate slots for the third instruction).
One other wrinkle is that when the instruction refers to registers like “R1” or “R2” that doesn’t mean we have a specific register that is always ”R1.” We have a large collection of registers, and we use whatever one is available. There can be multiple versions of R1 floating around, depending on the sequence of instructions. This is called “register renaming“ (https://en.wikipedia.org/wiki/Register_renaming) and everyone does some version of this.
There are LOTS of ways to do this. I owned the scheduler on a SPARC design, and we used reservation stations (https://en.wikipedia.org/wiki/Reservation_station). But there are other techniques as well.
Nothing obtuse about my statement. You said they decided to focus on developing processors which consumed more and more power.No - you are being obtuse - a business decision was made to focus on a market segment where power efficiency was not one the leading feature. When you are on the business side of the organization and are in meetings where such discussions and decisions are made - you can comment on this.
So the article is incorrect in citing the use of micro ops by Apple Silicon? I did get the general point of RISC micro ops being more uniform so that Apple with the WIDE pipe and huge caches and buffers is simply processing a lot more instructions at one time than the others.
This same explanation applies to CISC as well, it is not limited to RISC.OK that makes sense. Thanks for the information. Informative.
SoC design? I wish I would have figured that out: but rather Apple placing everything in one package including specialized processing units
Geez people, do read what people are saying instead of just dog piling on.
This same explanation applies to CISC as well, it is not limited to RISC.
Thank you sir!I’m sure they use something that could be called “micro ops.” That doesn’t mean they use microcode, is my point. But a micro op could just be, for example, a fixed 82 bit field that expands on the 64 bits of the normal instruction by adding convenience bits like “this instruction needs the ALU” or “this instruction needs the floating point unit” or whatever. These are things that could be done later in the pipeline, but sometimes knowing these things early on make it easier to schedule the instructions, provide hints to the branch prediction unit, etc.
I'm by far a business expert but I have to imagine Intel's bottom line will certainly be affected as time moves on with Apple using their own CPUs and not Intel's.With macs stubbornly stuck at 8% market share for the last two decades - Intel has nothing to worry about. And AMD has never been and never would end up in a Mac any way. People are pretty much set in their ways. Camp Windows or Camp Mac.