Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Why do people think this is an architecture change ? :confused:

All it is is a new technology for the already in use ARM chips in Apple's IOS devices. At most there will be new API's for developers.
 
If you, as a programmer, still give two craps what chip your code is running on -- well, you suck.

Garbage. Exactly the opposite.

If you are trying to stand-out by pushing the envelope in real-time crunching per second (which applies to many of the games, audio, voice processing, image processing and VR apps in the App store, among others), then knowing precise details about the performance of your chip, and how to tweak your code for it, is absolutely necessary. It's the difference between a smooth 30 or 60 fps game or VR view, and some jerky mess with glitching sound.

Same if you are trying to absolutely minimize battery use per crunch for longer running apps. S*cky are the programmers who don't care about this stuff, and waste the user's battery life as well as contributing to so-called global warming.
 
Apple trademarks everything. I recently saw them trademark the word, "automagically". This is ridiculous. Magic Mountain (in Valencia, Ca., now 6 Flags) has been using this word for their monorail service in their theme park since the 70's! I use to always get a kick out of this sentence which was spoken by a sultry electronic vixen: "Please keep all arms and body parts in the train at all times. The door will close automagically."
 
This doesn't sound like a new idea at all, nor does it sound very important since modern chips do branch prediction to keep the pipelines full and do it well.

To me it just sounds like a case of a strange trademark and will probably not amount to anything. Despite the "A4" and the "A5" Apple isn't a chip design company -- like many companies they simply license preexisting designs to create custom system on a chips (SoCs).

Consider, for example, the trace cache, which does a similar thing but based on dynamic execution of the program, and so requires no compiler support.

http://en.wikipedia.org/wiki/NetBurst_(microarchitecture)#Execution_Trace_Cache

instead of fetching and decoding the instruction again, the CPU directly accesses the decoded micro-ops from the trace cache, thereby saving considerable time. Moreover, the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution
 
Last edited:
Would it be silly to think that this project is about creating a Native LLVM processor?

Build the runtime optimiser as a block that could be tied directly to a standard core or even bunch of cores. Similar to how Apple use LLVM to dynamically switch OpenGL code from GPU cores to CPU cores.
 
Would it be silly to think that this project is about creating a Native LLVM processor?

Build the runtime optimiser as a block that could be tied directly to a standard core or even bunch of cores. Similar to how Apple use LLVM to dynamically switch OpenGL code from GPU cores to CPU cores.

Can you provide a reference on your statement about OpenGL? I don't think it's correct.
 
Can you provide a reference on your statement about OpenGL? I don't think it's correct.

From Wikipeadia LLVm
The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for partial evaluation in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the OpenGL pipeline of Mac OS X Leopard (v10.5) to provide support for missing hardware features.[5] Graphics code within the OpenGL stack was left in intermediate form, and then compiled when run on the target machine. On systems with high-end GPUs, the resulting code was quite thin, passing the instructions onto the GPU with minimal changes. On systems with low-end GPUs, LLVM would compile optional procedures that run on the local central processing unit (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA chipsets. A similar system was developed under the Gallium3D LLVMpipe, and incorporated into the GNOME shell to allow it to run without a GPU.[6]

Sorry if I've misunderstood what was said here.
 
Microarchitectural enhancements...

This doesn't sound like a new idea at all, nor does it sound very important since modern chips do branch prediction to keep the pipelines full and do it well.

Consider, for example, the trace cache, which does the same thing but based on dynamic execution of the program, and so requires no compiler support.

http://en.wikipedia.org/wiki/NetBurst_(microarchitecture)#Execution_Trace_Cache

You're actually confusing a couple of things here. First of all, the end of a loop is typically not predicted by branch prediction (or even a branch target buffer), but by a special hardware loop counter. (Although I've never seen a loop counter implemented in conjunction with a trace cache, and I don't think x86 supports loop counters.) Secondly, the one situation not handled by any of the methods you've mentioned is the end of a loop, which is precisely what this patent deals with. Neither branch predictors, branch target buffers nor trace caches do well with the loop ending - they all end up with an empty pipeline to fill, because they all (rightfully) predict the loop will continue. Granted, you could craft a loop to work with a very specific branch predictor, but that's going beyond the boundaries of safely targeted code.

The flipside of this statement is there's a REASON everyone focuses on getting the loop part right - almost all an application's execution time is inside the loop, and it only exits once, so there's not a lot of motive to try to solve that problem.

Regarding the other posts, I seriously doubt this would be relevant to any other architecture than ARM, and it's probably mostly a microarchitectural enhancement, meaning the ISA wouldn't change, or would change ever so slightly in a way that wouldn't break correctness.

But I just wanted to add - it sounds like their design is much larger scope than the way we're describing it - and it would have to be to justify such a name. Sound more like a form of software pipelining in hardware, where you trace through the dependent code chains across loop bodies as fast as possible while letting the independent parts complete in parallel. This could cause a major paradigm shift for high level parallelism because no longer would a loop body have to be truly parallel to be parallelized. Just about any loop could take advantage of multiple cores.
 
Last edited:
Apple don't design CPU's, they design SoC's

CPUs are part of SoCs.

As for this term, techniques to decompose loops are nothing new. Architectures and compilers have been unrolling loops and doing other things for decades as part of a larger scheme to correctly predict the flow of instructions. I'm very curious to see what Apple could do here that hasn't already been accomplished by an aggressive compiler or processor architecture.

Why do people think this is an architecture change ? :confused:

All it is is a new technology for the already in use ARM chips in Apple's IOS devices. At most there will be new API's for developers.

Likely not. Maybe coding guidelines, but this will likely be transparent for any new functionality. Just meant to make existing code scenarios be more efficient, if the description is accurate.
 
Last edited:
This certainly sounds interesting...I guess it could have performance implications when you have nested loops. Otherwise it seems the performance benefit would be minuscule as it will only be improving the pipeline when a loop completes.
 
Now this is the cool stuff I want to hear more about. :)

Since Apple seems to be gearing up the trademark on the term, you could be hearing more about this very very soon.

If this is what I suspect it is, Apple is looking for multiple ways to efficiently have higher performance portable devices with longer battery life.
 
This certainly sounds interesting...I guess it could have performance implications when you have nested loops. Otherwise it seems the performance benefit would be minuscule as it will only be improving the pipeline when a loop completes.

It's likely more than that. You don't come up with a big sounding name like macroscalar for a loop decomposition technique.
 
If you, as a programmer, still give two craps what chip your code is running on -- well, you suck. That would also explain why barista is an acceptable move, salary wise.

Abstract, brother, abstract. Use blocks, adapter objects, build lightweight APIs around your process intensive work and use the core APIs wherever possible -- and advances in chips becomes basically free. Stop ice skating uphill.

This is correct if you're using Java, but if you're coding in C then you better care about the chip your code is running on. Statements like this just make you seem like the sucky programmer.
 
Since Apple seems to be gearing up the trademark on the term, you could be hearing more about this very very soon.

If this is what I suspect it is, Apple is looking for multiple ways to efficiently have higher performance portable devices with longer battery life.

You think?
 
That's actually very interesting. I suppose they did this because the original Intel GMA lacked support for vertex programs so that part of the OpenGL stack had to run on the CPU.

Footnote 5 Is to an Apple engineer on the LLVM Team mailing list talking about that sort of use.
 
Apple trademarks everything. I recently saw them trademark the word, "automagically". This is ridiculous.

Yep, it certainly sounds ridiculous. Can you point to any source for that? Would be interesting.

----------

Garbage. Exactly the opposite.

If you are trying to stand-out by pushing the envelope in real-time crunching per second (which applies to many of the games, audio, voice processing, image processing and VR apps in the App store, among others), then knowing precise details about the performance of your chip, and how to tweak your code for it, is absolutely necessary. It's the difference between a smooth 30 or 60 fps game or VR view, and some jerky mess with glitching sound.

If you are writing in assembler, or writing your own OS, then you are right.
If you aren't, then you are exaggerating a tiny little bit, aren't you?

Same if you are trying to absolutely minimize battery use per crunch for longer running apps. S*cky are the programmers who don't care about this stuff, and waste the user's battery life as well as contributing to so-called global warming.

Interesting that for most of what you say I'd first think about correctly using the APIs, and the compiler, and the OS, instead of trying to do everything yourself and fighting the system (and the rest of its users). You know, "good citizen" vs "commando".

And a green-washed commando at that :p

----------

This is correct if you're using Java, but if you're coding in C then you better care about the chip your code is running on. Statements like this just make you seem like the sucky programmer.

The people who built the first unixes for different architectures in portable C beg to differ. Or maybe they were sucky too?
 
FINALLY! The PowerBook G5 CPU has arrived!

Apple don't design CPU's, they design SoC's

Please name one "SoC" Apple has ever designed. SoC is a silly, arbitrary term, and it's basically still a CPU, just with a few more features that historically haven't been on-die. L2 cache wasn't even on-die for a very long time.

----------

Why do people think this is an architecture change ? :confused:

All it is is a new technology for the already in use ARM chips in Apple's IOS devices. At most there will be new API's for developers.

If it's been in development since 2004, though, that might rule out ARM chips. Unless it was only the theory they were developing, and could implement it in any chip. Historically Apple hasn't had much (hardly anything) to do with CPU design, they've stuck more to ancillary efforts like ICs and the like. (Yes, I know about the AIM alliance..)
 
Is it just me or is Macroscalar not really a marketable term?

I know next to nothing about all the technical stuff you've been discussing, so it was an interesting (and partially comprehensible) thread to read through. But I don't see Apple using the term Macroscalar as they have used Retina.

Just some food for thought here: Apple's mission has always been about bringing the "power to the masses", like they did with GarageBand, iMovie and iBooks Author. Will they not someday move towards a closed environment where people create their software with iProgram and publish it exclusively to the Mac App Store? I wouldn't want this to happen, but is there even the slightest chance Apple would ever want this?
 
Is it just me or is Macroscalar not really a marketable term?

Is LLVM a marketable term? OpenCL? Even Grand Central?

I know next to nothing about all the technical stuff you've been discussing, so it was an interesting (and partially comprehensible) thread to read through. But I don't see Apple using the term Macroscalar as they have used Retina.

Why should they?

Will they not someday move towards a closed environment where people create their software with iProgram and publish it exclusively to the Mac App Store? I wouldn't want this to happen, but is there even the slightest chance Apple would ever want this?

Yep.
There is also "even the slightest chance" that the world will end tomorrow, FWIW.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.