Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Surely there is not going to be "a" chip but various chips? If you look at an AMD or Intel chip launch there is normally a family of chips aimed at different types of hardware in terms of performance and price.

When Apple releases Arm Silicon machines, is it likely to want to continue offering the same hardware item at different price points with ARM chips with different specs, more RAM, larger SSD, different graphics cards. I would presume so.
 
Lol.. the point was I'd trust Stackoverflow over some random dude on these forums. Nobody uses -O3 the benefit is tiny and has some added risk. Even the speed benefit for -O2 in production is small over -O1 and I work on very large software packages.

Than you will probably be happy to know that there should not be a significant difference between ARM and x64 binary in your case.
 
All this is irrelevant. The main task of the dtk is to test your builds - not do the actual build. You can do the actual build on any system with Xcode installed.
Unless you are writing a compiler ;-)
[automerge]1593525796[/automerge]
you mean iPad Pro with A12X, not A12Z right?
Same chip; only difference is an extra GPU core (which was probably there, but fused off, on A12X)
 
  • Like
Reactions: snourse


While the terms and conditions for Apple's new "Developer Transition Kit" forbid developers from running benchmarks on the modified Mac mini with an A12Z chip, it appears that results are beginning to surface anyhow.

apple-developer-transition-kit-box.jpg

Image Credit: Radek Pietruszewski

Geekbench results uploaded so far suggest that the A12Z-based Mac mini has average single-core and multi-core scores of 811 and 2,781 respectively. Keep in mind that Geekbench is running through Apple's translation layer Rosetta 2, so an impact on performance is to be expected. Apple also appears to be slightly underclocking the A12Z chip in the Mac mini to 2.4GHz versus nearly 2.5GHz in the latest iPad Pro models.

rosetta-2-benchmarks-a12z-mac-mini.jpg

It's also worth noting that Rosetta 2 appears to only use the A12Z chip's four "performance" cores and not its four "efficiency" cores.

By comparison, iPad Pro models with the A12Z chip have average single-core and multi-core scores of 1,118 and 4,625 respectively. This is native performance, of course, based on Arm architecture.


Article Link: Rosetta 2 Benchmarks Surface From Mac Mini With A12Z Chip

This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.
[automerge]1593529077[/automerge]
Remember that A12 is two years old. It is not a part of the "Family of CPU's for desktop" that will be in the first Apple Silicon Macs.

I really hope not but if game developers are going to make Mac games on this machine they are going to give up. And not some boring Arcade stuff but triple A title.
 
For what it's worth there is no translation *layer*. The binary gets translated on install time and then runs *natively*. The reason why lower performance should be expected is smaller instruction set meaning some single x86 instructions can be translated into multiple ARM instructions.
 
This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.

Not sure we are looking at the same benchmarks here? The DTK kit scores not far from a 2020 Quad Core MacBook Air running Ice Lake. And this is translated (non-native) Geekbench.

Athlon 3000G multi-core performance is significantly lower, and if you are looking at the single-core performance... please don't forget you have a 35W desktop CPU running at 3.5Ghz on one hand an an iPad CPU running at 2.4Ghz...
[automerge]1593530356[/automerge]
The reason why lower performance should be expected is smaller instruction set meaning some single x86 instructions can be translated into multiple ARM instructions.

And that the transpired code probably needs to perform additional work to ensure correctness.
[automerge]1593530454[/automerge]
I really hope not but if game developers are going to make Mac games on this machine they are going to give up. And not some boring Arcade stuff but triple A title.

It is uncommon for game developers to target Mac as a primary platform in the first place. ARM Macs will be a much better gaming platform, but it will require more work to take advantage of the Apple hardware.
 
  • Like
Reactions: psychicist
For what it's worth there is no translation *layer*. The binary gets translated on install time and then runs *natively*. The reason why lower performance should be expected is smaller instruction set meaning some single x86 instructions can be translated into multiple ARM instructions.

Almost, but not quite. Speed has little to do with the number of instructions. When one x86 instruction is translated to three ARM instructions, it does not take Arm 3x as long to run them. It would likely take the x86 three passes through the ALUs to execute the complex instruction anyway. And the Arm can sometimes do it in fewer passes because x86 will include memory accesses in the instruction and has fewer registers, whereas Arm has lots of registers and can coalesce memory stores.

The reason there is a penalty due to rosetta is that the translation from x86 to Arm is not perfectly smart. You are taking code that was optimized for x86 (in terms of the order of instructions, what instructions were chosen, etc.) and without access to source code you are translating. The result is NOT what you would get if you took the same source code and simply compiled to Arm. It will be more inefficient, doing instructions that may not be required, not using Arm instructions that might be faster that the Arm instructions that directly correspond to the x86 instruction, etc.
 
This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.
[automerge]1593529077[/automerge]


I really hope not but if game developers are going to make Mac games on this machine they are going to give up. And not some boring Arcade stuff but triple A title.

Your concerns are exactly why Apple rightly banned the public benchmark tests on this prototype in their NDA; it's not actually going to be like this and no, this isn't the lowest end. It has nothing to do with any of upcoming retail Macs with Apple SoCs.

Apple has already said several times the upcoming Apple Silicon Macs will have their own dedicated line of SoC. It is NOT going to be A12Z, they're simply reusing the fastest production SoC they have, which is A12Z from iPad Pro (sort of since A13 is faster already but they didn't ship X/Z version of A13). A12Z is designed to be for tablets only, low power/high battery. A13 is already 20% faster across the board.

A12Z is nowhere close to any of the actual Mac chips because Macs has a much higher thermal budget than any of the thin tablets. They will reveal a new A14 generation that will be on 5nm and the Mac series of A14 will be more medium balance of power / battery for laptop and high power only for desktop/iMac units.

Your Athlon example is 35W, A12Z is 5-8W. Not even a valid comparison by any stretch, especially since the benchmark was emulated and 30% slower.
 
Last edited:
do you think affinity designer, or photoshop in ipad are tiny consumer apps?
designer is great! photoshop - eh. ;-)
[automerge]1593534452[/automerge]
Your concerns are exactly why Apple rightly banned the public benchmark tests on this prototype in their NDA; it's not actually going to be like this and no, this isn't the lowest end. It has nothing to do with any of upcoming retail Macs with Apple SoCs.

Apple has already said several times the upcoming Apple Silicon Macs will have their own dedicated line of SoC. It is NOT going to be A12Z, they're simply reusing the fastest production SoC they have, which is A12Z from iPad Pro (sort of since A13 is faster already but they didn't ship X/Z version of A13). A12Z is designed to be for tablets only, low power/high battery. A13 is already 20% faster across the board.

A12Z is nowhere close to any of the actual Mac chips because Macs has a much higher thermal budget than any of the thin tablets. They will reveal a new A14 generation that will be on 5nm and the Mac series of A14 will be more medium balance of power / battery for laptop and high power only for desktop/iMac units.

Your Athlon example is 35W, A12Z is 5-8W. Not even a valid comparison by any stretch, especially since the benchmark was emulated and 30% slower.

I designed that athlon dual core chip. Stop dissing it. :)
 
This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.
[automerge]1593529077[/automerge]

People aren't poking fun at the benchmark, they are poking fun at the people that think the benchmarks have any bearing whatsoever on the chips Apple are actually going to use in the devices they release.

Anyone that is trying to gauge performance of unreleased chips based on underclocked iPad chips deserve ridicule.
 
You are Jim Keller? 😳 It’s an honour to have you on these forums, sir!
Jim Keller architected only the hypertransport bus (then called “ldt bus”) on that chip. I designed the integer execution units, floating point unit, and scheduler/register renamer (at various points), as well as the floor planning methodology, buffer insertion scheme, clock gating scheme, power grid scheme, block pin assignment scheme, etc.
 
Almost, but not quite. Speed has little to do with the number of instructions. When one x86 instruction is translated to three ARM instructions, it does not take Arm 3x as long to run them. It would likely take the x86 three passes through the ALUs to execute the complex instruction anyway. And the Arm can sometimes do it in fewer passes because x86 will include memory accesses in the instruction and has fewer registers, whereas Arm has lots of registers and can coalesce memory stores.

The reason there is a penalty due to rosetta is that the translation from x86 to Arm is not perfectly smart. You are taking code that was optimized for x86 (in terms of the order of instructions, what instructions were chosen, etc.) and without access to source code you are translating. The result is NOT what you would get if you took the same source code and simply compiled to Arm. It will be more inefficient, doing instructions that may not be required, not using Arm instructions that might be faster that the Arm instructions that directly correspond to the x86 instruction, etc.
Isn't the main purpose of adding new instructions to make certain tasks faster? I think this HW optimization would yield much better performance gains than SW optimizations during compile time.
 
Isn't the main purpose of adding new instructions to make certain tasks faster? I think this HW optimization would yield much better performance gains than SW optimizations during compile time.

These two sentences don't seem to be connected.

Compilers are optimized to convert developers' intent into assembly language instructions. If you then convert those instructions to another assembly language, it's like making a photocopy of a photocopy. You lose information each time.

A compiler does a better job of optimizing code that a hardware translator, because (1) it has more information about designer intent and (2) it can take as much time as it wants to do so.

So, for example, if I have code like:

a = a + 1

In intel code it might do:

add [memory a] 1 -> [memory a]

That's one instruction but it takes a lot of time because it accesses memory twice (once for a read, once for a write).

rosetta might translate that to Arm like:

load r0, [memory a]
add r0, 1 -> r0
store r0, [memory a]

This would take exactly the same amount of time as the intel instruction.

But a compiler that understands Arm may instead have simply done:

add r0, 1 -> r0

That would be much smarter. The load/store were there because x86 doesn't have a lot of registers. Not necessary on Arm.


These sort of optimizations CAN be done by rosetta, but it gets harder and harder the more you try to do it, and it will likely never be as good as just compiling to Arm in the first place.
 
I work at Autodesk and have the source code for both Maya and Fusion 360 (two of our Mac products) on my system.

So tell us about the version of Maya that CraigF demo’d to us all running on the A12Z chip.
It looked pretty good to me.
I guess you know the person that compiled it...

and yes as others have mentioned the DTK is not for compiling on.

I’m sure you know that Maya 2020 Mac edition has a minimum 8GB RAM requirement with only 16GB recommended. So the DTK is perfect for the job.
 
Last edited by a moderator:
I can see why Apple didn't want people doing benchmarks on these things. People are talking as though this transition kit is going to be representative of actual hardware. There's a reason why it's called a transition kit, it's just to make sure your software is ready to run on Apple Silicon hardware when it finally comes out.
 
  • Like
Reactions: dmccloud
Common sense is also that Apple will package two or four of these chips into one package, at which point we will have a huge improvement for all the Macs with four to eight cores. We will have a huge improvement running x86 codes through Rosetta, and native code will fly.
The difference is enough to add another 12 or 14 processors (not cores, processors) to the ARM MacMini.

Other than on high-end Macs, I don't see the point of having multiple processor packages or that many cores. That's nice on a Mac Pro where some workloads actually make use of them, but almost nobody typically encounters those. Those cores will be idling a lot.

Much more useful to take approaches like Turbo Boost, or heterogenous cores like big.LITTLE or Fusion.
[automerge]1593597441[/automerge]
This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.

So you're saying a 2018 ~7W CPU is about as fast as a 2019 35W CPU. Sounds pretty good to me.
 
Last edited:
So, there was nobody else involved at all that helped with the design of Athlon? The ENTIRE CPU was designed by you only?

I did not diss the Athlon CPU, I simply pointed that you're comparing two separate category of CPUs and you see that as a diss?

That's like me saying AMD's Threadripper 3970X is 280W vs Athlon CPU at 35W, which means I'm dissing the Athlon one? Seriously?

You designing the CPU has nothing to do with anything here. You're still comparing two completely separate categories of CPUs that is not designed for each other.

1) I am not comparing anything. Why do you keep saying i am comparing two CPUs?
2) I was in charge of design methodology, i was a member of the team that designed the power grid, clock grid, standard cell architecture, floorplan and “globals,” i designed the floating point execution units, integer execution units, and scheduler/register renamer before passing them off to others to finish, I created the x86-64 (AMD64) instructions for integer operations, I was in charge of the methodology for placing the transistors and wiring and manually, in a text editor, fixed a last minute bug on the day of tapeout, etc.
 
  • Like
Reactions: snourse
1) I am not comparing anything. Why do you keep saying i am comparing two CPUs?
2) I was in charge of design methodology, i was a member of the team that designed the power grid, clock grid, standard cell architecture, floorplan and “globals,” i designed the floating point execution units, integer execution units, and scheduler/register renamer before passing them off to others to finish, I created the x86-64 (AMD64) instructions for integer operations, I was in charge of the methodology for placing the transistors and wiring and manually, in a text editor, fixed a last minute bug on the day of tapeout, etc.

My apologies, I got confused by the quotes. I thought you were the guy that wrote the original quote and replied to my reply about it; this one specifically

> This is not a good debut for the lowest end performance already. Even if the scores mirror two year old performance this is comparable to the slowest Athlon dual core on the AM4 socket. Folks poking fun at this benchmark doesn’t care but I do for my photography work.

Now, I'm thinking you meant something else.

Sorry again.
 
This is incredibly good. This combination is getting over 60% of native speed. Even the second best chip and software combinations that were designed to primarily to execute x86 only got about 40% native speed, and those were only mediocre with their native instruction sets.
 
I work at Autodesk and have the source code for both Maya and Fusion 360 (two of our Mac products) on my system.

This developer transition kit is completely and utterly inadequate. 16GB of RAM? Man, I feel bad for whoever gets stuck with the arduous task of getting our stuff running on this system. It's going to be awful.

Maya needs 96GB++ of memory to compile at a speed even vaguely approaching usable, and the development teams are typically using Ryzen 9 or Threadripper systems running Linux or Windows. These CPUs are 4x faster than A12Z. Fusion's requirements for decent compile perf are somewhat lower than Maya, but still well beyond 16 GB RAM.

Like many here, I am confident that the systems Apple introduces over the next year will be a significant step forward from this kit, but the gap they have to cross to meet what Autodesk's pro customers are using today -- never mind what they will expect two years from now -- is still really vast. They can't just catch up with the perf of the current Mac Pro. AMD Zen 3 is just around the corner and it's going to kick the everlovin' bejeezus out of every Mac and Intel system in 3D workloads.

But even if the CPU perf situation at the high-end turns out okay, what is really going to make or break it for us is video driver quality. It always has. NVIDIA and AMD are many, many years ahead of Apple on driver quality and developer relations.... if Apple continues to be uncaring about OpenGL and Vulkan, and if they don't have comprehensive raytracing support on next-gen AMD Radeon GPUs in a timely fashion, then Apple is going to lose the 3D market almost completely.

(Usual disclaimer: these are my opinions, not my employer's)


Whatever Autodesk is planning to do, it had better be impressive both in terms of quality and pricing. The current pricing for Maya has left many annoyed after the switch from 'Maintenance' to 'Subscription'. If AD isn't careful, Maya is going to start losing the 3D market with the new generation of artists increasingly preferring more affordable tools with arguably more modern code bases, such as Blender, Unreal/Unity, Houdini, etc.
Even after the big rejuvenation phase Maya has undergone over the last 5+ years, it still feel frustratingly flakey and slow in too many instances (disclaimer: I've been a daily Maya user for the past 20 years). BiFrost isn't gaining much traction out there and the Viewport 2.0 development effort is no match for the likes of Unreal/Unity. As an AD employee, I'd be more worried about this before worrying about Apple's DTK hardware. Nonetheless, I hope to see the Maya devs make a serious and concerted effort with this move to ARM, because you better believe the Blender Foundation, Side Effects, Adobe, Foundry et al. will.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.