What is “AT” ?
Yes, must be. I don’t think Ars has a “cpu forum” — AnandTech is now just an archive, though, as a news site it shut down at the end of August last year. Are those forums even moderated now?I’m guessing AnandTech forums.
Here’s my old transcript of his comments in that interview. It is edited for clarity, removing interjections like "I think" and "kind of" and "sort of," but I've retained his rhetorical ", right? ..." and “So …” habits, as they capture the feel of the dialogue. It's a good overview of what Apple is trying to do.I’m kind of half-expecting Apple to bring Anand Shimpi into a more visible role, I thought he did an outstanding job explaining the attitudes and priorities of the Apple silicon design team in an interview with Andru Edwards back in February 2023.
Everyone has NOT switched to tiles!
AMD has (as I described above) for optionality reasons.
Intel did for marketing reasons and, this being stupid, it's one more thing killing them.
Meanwhile:
nV and Apple both use chiplets as an idea, but make the chiplets as large as practical before gluing them together, not the crazy Ponte Vecchio level disaggregation Intel is championing.
QC, Mediatek, Ampere, AMZ, Google (anyone I left out) are not using them.
Chiplets are a technology, and like most technologies, have their place. My complaint is that you seem to have absolutely drunk the Intel koolaid that chiplets are the optimal design point for everything going forward, even something as small as an M1 class sort of chip (ie something targeting a tablet or low-end laptop).
It may be (unlike Intel I don't claim to predict the future ten years from now) that in ten years chiplets will in fact be the optimal design point for everything from a phone upward. But I doubt it, and that is certainly NOT the case today.
And I'm sorry, I can't take anyone who talks about the "M2/M3" lag seriously. This, like the claims about Apple loss of talent, is the sort of copium nonsense you read about on x86-support-group sites, not in genuine technical discussion.
Sorry, yes AT is the gen shorthand for Anandtech.Yes, must be. I don’t think Ars has a “cpu forum” — AnandTech is now just an archive,
512+ core GPU is already here just not the cores you want to see ofc. but still. I love this pipe dream posts while max we'll see is like 1/4 of that.Apple moving to Tiling (chiplets) for ASi could definitely improve performance in regards to their GPU, just throw more GPU chiplets into the mix...
- 32-core CPU (24P/8E)
- 512-core GPU
- 128-core Neural Engine
- 960GB ECC LPDDR5X RAM
- 2.16TB/s UMA bandwidth
As for Apple M series issues, all I know is what I've read and heard from folks in the industry and litho/fab Macheads. My admittedly limited understanding of the issue is that with that top tier talent loss, the M2 clock was goosed a bit, and for M3 the existing team yielded some improvements, but goosed the clocks quite a bit more.
This matches with a quick google: https://medium.com/@kellyshephard/m3-vs-m2-vs-m1-macbook-air-which-one-is-right-for-you-8e7c532e1de4
M1-M2, Apple increased xter density by 25%, stayed on same Node, and was that massive improvement?
M2-M3, Apple increased xter density again by 20%, and switched from N5 to N3, with significant clockspeed increase, and saw significant improvement. Is that surprising? No.
Its exactly what AMD/Intel have historically done on their Tick gen's.
I'm absolutely sure IPC and other improvements took place in M3, however when you increase xter density by close to 50%, jump to a newer Node, and increase clock speeds, its rather loopy to simply explain it all as innate Apple R&D IPC improvement.
What "general industry feeling" would that be? Random idiots pretending to be experts on internet forums are not the same thing as industry knowledge. If you don't have the background to tell for yourself whether an idea is genuinely from an industry insider, maybe you shouldn't uncritically accept it as received wisdom.I'll stick with the general industry feeling that Apple has mostly maintained its quite nice improvements more based on one-shot Node jumps, clocking increases over some wizardry by its R&D guru's.
Obviously the foundry node and ARM architecture are important - no one would disagree (well maybe Qualcomm's lawyers for the latter part but that's a separate issue). Heck even Apple's choice of 16Kb page sizes for its OS is helpful because it helps enable larger L1 caches while maintaining a lower associativity and simpler design. And no Apple's CPU designs aren't magic any more than TSMC's fabs are or ARM's instruction set is. Heck, Qualcomm and ARM themselves are getting better and better - the X925 and Oryon-L may be behind the A18Pro but are still damn good and closer than they've ever been to an Apple core - certainly much better than AMD/Intel have managed. I even have Qualcomm's original Orion core down below in comparison to the M2 which is probably their best comparison given the node (N4 vs N5P).Thats a fair point.
I think Apple has done close to a miracle with the M series and at the same time has used Node/Process improvements its payed top dollar for from TSMC to keep up some remarkable advances.
Actual 'improvement' in systems can be looked at in different ways, sometimes actual Architectural improvement by Apple/Intel/AMD Design engineers, other times Process improvements intra-Node that allow more speed as Intel did with all their +++ derivatives, and sometimes just jumping onto a newer Node that give 15-20% more speed and/or transistor density at or not at exactly ISO.
I can certainly be wrong, however in my view far too many people are assigning Apples continued success as primarily due to some wizardy in Cupertino. I think the reality is that there is certainly some of that, but a lot of the remaining is based on Apple paying for the lion's share of future node development at TSMC and getting exclusive, often primary use of that advance.
Those advances seem to generally be about 15% IIRC when looked at from an industry standard ISO.
AMD was doing this all through the 2020's to Intel until Intel finally jumped on N3B and start being fairly competitive.
I think Apple is mining the heck out of ARM to get some of these increases, however its also a much simpler ISA than x86. As it is simpler, I don't think its going to be able to be mined for years and years as Intel and AMD have.
With Nodes set to continue to slow, process improvements will likewise slow, whether ARM or x86. Apple still has headway before hitting the 5-6Ghz redline x86 has, so it can continue to gas it for M5/M6. And I'll eat my hat if Apple somehow finds a way to not see power increases just as everyone else in Foundry world has.
And, I wouldn't have bothered dumping x86, even nice Linux Mint if I didn't think Apple weren't going to be dominant going forward.
I'll be honest, I'll give it a shot but this is so wrong I'm not really where to begin with this. ARMv8 has a much simpler structure than x86-64 (i.e. all instructions have the same length) which allows it to be decoded easier (a major reason why Apple and other ARM cores can go so wide more easily) and there is less historical baggage to the ARM v8/9 architecture (no 32bit or lower instruction set around). But the instruction set itself is no less "complex" and ARMv8/9 can attain similar if not greater levels of code density than x86. ARMv8 may be RISC, but it's pragmatic. It doesn't hew to classical RISC paradigms when performance would suffer. Further, the instruction set v8/9 is ever evolving, just as x86 is as well (although Intel pared back just how ambitious they would evolve it recently, it is changing).I think Apple is mining the heck out of ARM to get some of these increases, however it's also a much simpler ISA than x86. As it is simpler, I don't think it's going to be able to be mined for years and years as Intel and AMD have.
Amazing post, very informative.Obviously the foundry node and ARM architecture are important - no one would disagree (well maybe Qualcomm's lawyers for the latter part but that's a separate issue). Heck even Apple's choice of 16Kb page sizes for its OS is helpful because it helps enable larger L1 caches while maintaining a lower associativity and simpler design. And no Apple's CPU designs aren't magic any more than TSMC's fabs are or ARM's instruction set is. Heck, Qualcomm and ARM themselves are getting better and better - the X925 and Oryon-L may be behind the A18Pro but are still damn good and closer than they've ever been to an Apple core - certainly much better than AMD/Intel have managed. I even have Qualcomm's original Orion core down below in comparison to the M2 which is probably their best comparison given the node (N4 vs N5P).
I would also contend though that this entire discussion has largely ignored Apple's efficiency cores which have their own trajectory in terms of performance, clocks, and power and are in many ways even more impressive than their high-performance cousins (and Qualcomm/ARM's little cores are also finally catching up here too).
BUT overall I think you are conflating a few things together. Let's try to tease things apart. A new foundry node allows a chipmaker to run a design at faster speeds at lower power. A new transistor density can also help a chipmaker to make a new design - that's not technically necessary (Zen 2 and Zen 3 were on the same node and Zen 3 was probably AMD's most impressive chip improvement partially as a result). Further merely moving to a new node doesn't magically equilibrate things. ARM designs for Android for years would lag Apple by mere months on the new process nodes, but even then never came close to matching their performance or efficiency in ST. But for a more modern, PC-oriented comparison, we can compare Zen 4 to M2 (Zen 4 is on marginally better node than M2 - N4P vs N5P) and Lunar Lake to M3 (same node, N3B).
(All subsequent plots are thumbnails which can be expanded by clicking on them)
Here are Cinebench R24 results (data courtesy of NotebookCheck with some differences - I subtract idle power out to try to get power under load only - closest wall power measures get to package power):
View attachment 2468862
Intel being on the same node as Apple neither allowed it to catch up to Apple in performance nor in power consumption (a factor of the die area x clock speed). Don't get me wrong, Lunar Lake is a massive improvement over Meteor Lake, Metro Lake was so awful it wouldn't fit on this chart. However, with Intel prioritizing ST efficiency at every step of the way, sure they did better than the (much bigger) AMD chip, but didn't even catch Qualcomm's slightly worse version of the M2 on N4 never mind Apple on N3B. Similarly AMD's Zen 5 HX 370 is on a slightly better node (N4P) than the M2 Pro (N5P) and while it manages only slightly worse ST performance, it has vastly worse power consumption with its higher clocks to achieve that. The end result of AMD and Intel "catching up" with Apple in terms of node is 2-3x worse efficiency at lower performance. Now unlike in R23 (which was technically native but not very well optimized for ARM), Apple does particularly well in the R24 benchmark. So let's look at Geekbench 6. And since the assertion is that Apple's development has "slowed down" let's compare Apple's progress from M1 (Nov 2020) to M4 (May 2024) versus AMD with Zen 3 (Nov 2020) to Zen 5 (Jul/Aug 2024). (As an aside, with the simultaneous release of the M1 and Zen 3, November of 2020 must've been a really bad month for morale in Intel's offices.) I'll use AMD's desktop chips to give them the best performance matchup and just look at clocks since Zen 5 desktop on N4X uses quite a bit of power (also measuring wall power on GB is harder given the nature of the benchmark). Yes, some people go overboard with praising Apple, but your posts are hewing the other way too much. It isn't just access to the latest fabs, Apple really does have an excellent design that they have continued to iterate on which I will further demonstrate below.
Now here we have a problem. We COULD just take the top line GB 6 score and divide by clock speed or do the same with SPEC or like AMD pick an assortment of various benchmarks to create a similar composite. But to be particularly blunt, that's nonsensical. Yes, I know that marketing departments do that all the time but "IPC" as weighted geometric of a bunch of incredibly disparate sub-benchmarks is meaningless. Also it's supposed to be "instructions per clock" but ARM and x86 have different instructions which means IPC isn't particularly comparable between them. Since what people really mean is performance-per-clock and the term IPC has been so polluted from its original definition, I prefer to use the term ICP (ISO-clock performance). Also, when really trying to discuss performance to break out each sub-test separately. So let's do that!
Now some caveats - Geekbench even in the best of circumstances can show huge variance in results. The best way to handle this I've seen is to use violin plots generated from bootstrapping Geekbench data. Unfortunately that requires more time than I was willing to put in, the end result in the plots would be extremely busy because of how many points of comparison I'm doing, the clock speeds of the CPU even in GB may not stay at max boost during the whole test, and AMD especially could be biased because the database will include overclockers. So I tried to choose representative samples from the M1, M4, 7950X, and 9950X to compare that best represent data from GB 6's average database and/or reviewers. There's little I can do about the clock speed issue though and even GB's clock records in the JSON file are slightly suspect as it isn't clear when or how they are recorded. So for those charts that plot ICP rather than raw performance, I simply use max clock speed.
View attachment 2468875
First up, ISO-clock performance ratio of the M4/M1, Zen 5/Zen 3, and M1/Zen 5. Here we see that AMD has achieved marginally better ICP growth than Apple ... until you look at the grey bars and realize that with the exception of 3 benchmarks, HTML5, Object Detection (more on this in a bit), and Background Blur, the M1 still has better ISO-clock performance than the Zen 5. This is really what @leman was trying to get at. Apple's ICP advantage is so *massive* that their 4 year old chip on N5 has better ICP than AMD's latest N4X chip. That said, M1 is obviously not performance competitive with Zen 5 because it's clock speeds are too low. So that Apple have made architectural decisions that have improved ICP in some key areas while also allowing them to dramatically increase clocks by nearly 40%! - all while keeping ST power (mostly) under control (and yes TSMC absolutely deserves credit here as well, but as we discussed above it isn't just the fab).
Here's a chart which showcases this more dramatically with the bar chart replaced with an XY scatterplot:
View attachment 2468888
Since Object Detection is such an outlier for both AMD and Apple due to the introduction of AVX-512/VNNI (in Zen 4, but Zen 5 made improvements especially for desktops with full AVX-512) and SME (technically Apple had AMX cores previously but with the introduction of SME it could officially use them in the instruction set as opposed to the Apple-only Accelerate framework), I made a smaller blowup chart excluding it. The X-axis is the ICP ratio of M1/Zen 5 and the y-axis are the iso-clock ratios of Zen 5/Zen 3 (green) and M4/M1 (blue). What we can see from this chart is that those areas where AMD has gotten better than, or at least closest to the M1, are also the areas that Apple has worked hardest to improve - in some cases these were the weakest, least performant aspects of the M1 chip. For Object Detection well ... the M1 had no SME and yet it took the introduction of AVX-512 vectors more than doubling AMD's performance to beat the SME-less M1 by 13% (the M1 had more than double the ICP advantage over the Zen 3 here). Meanwhile with SME in the M4, Apple just pulls away again. Apple is prioritizing where it puts its gains to where it needs it the most - again this may make a weighted geomean ICP average not look so good, but the details reveal a far richer story. Apple is also on the record that they have their own set of user data and benchmarks that they look to target, this may not always correlate with SPEC/GB 6. Lastly on this point, I talk about this more below but just because ICP doesn't change doesn't mean that there weren't architectural changes that improved performance.
Here's another way to look at the data. Instead of ICP ratios, you can look at ICP deltas instead - the difference in points per clock Apple achieved. Because Apple's M1 ICP is already so high a middling ratio of improvement to the M4 might look less impressive ... until you realize that a smaller percentage of a bigger number can be very competitive with a larger percentage improvement of a much smaller number:
View attachment 2468893
Here we see in units of clock normalized performance, Apple's gains are competitive with AMD's.
Finally let's put this all together and look at raw, non-clock normalized ST performance improvements of Apple's baseline core that goes into passively cooled tablets compared to AMD's top of the line, fastest clocked desktop processor (clocks charted as well at the end):
View attachment 2468894
Blue in M1/Zen3 and green is M4/Zen 5 so chips released at the same time. Anything above 1 Apple's chip outperforms AMD's, anything below AMD's outperforms Apple's. Final bars are just clock speed ratios so as expected AMD's chips are clocked much faster than Apple's BUT the M4 is closer in clocks to the 9950X than the M1 was to the 7950X. Apple wins more often than it loses with nothing magical except that it's always doing so with lower clocks even if increases in clocks have been a contributor to those performance improvements (which again still necessitated architectural changes). So yes, Zen 5 has slightly closed the ICP gap with M4 compared to Zen 3 and M1, but that's because they had so far to go while Apple prioritized designs to increase clocks, one of its major weaknesses (although very high clocks like AMD/Intel are a weakness as well no question). The key takeaway here is that Apple has ensured in each and every test bar one (Text Processing, Clang and a couple of others are close), the M4/Zen 5 ratio (be it above or below 1) has improved on the M1/Zen 3 ratio. The green bar is always above the blue bar. And they've done it without busting their power budget. You don't get that from just fabs or even minor architectural improvements.
Some additional misconceptions:
I'll be honest, I'll give it a shot but this is so wrong I'm not really where to begin with this. ARMv8 has a much simpler structure than x86-64 (i.e. all instructions have the same length) which allows it to be decoded easier (a major reason why Apple and other ARM cores can go so wide more easily) and there is less historical baggage to the ARM v8/9 architecture (no 32bit or lower instruction set around). But the instruction set itself is no less "complex" and ARMv8/9 can attain similar if not greater levels of code density than x86. ARMv8 may be RISC, but it's pragmatic. It doesn't hew to classical RISC paradigms when performance would suffer. Further, the instruction set v8/9 is ever evolving, just as x86 is as well (although Intel paired back just how ambitious they would evolve it recently, it is changing).
====
As @mr_roboto alluded to in his post above, you often can't just increase clock speed and maintain the same IPC/ICP - think about interactions with the memory hierarchy for one. Cache accesses or worse cache misses and thus memory accesses are not necessarily tied to clock speed which means simply increasing clocks with no other changes to the architecture running any meaningful program will eventually lower the performance per clock as the CPU will simply be waiting additional cycles for data. We know Apple made small changes to the performance core of the M2 and much larger changes to the architecture of the performance cores of the M3 and M4, but they also massively increased clocks. Much more so than Intel and even AMD. This huge increase in clock speed has likely eaten much of the ICP gains that would've happened (see Horizon Detection where for the particular M1 and M4 chosen the M4 has very slightly lower ICP than the M1). On top of that we have my caveat above that simply using "max clocks" as many do, including myself above, may underestimate true ICP of any particular chip.
====
Lastly, throughout all of your posts there seems to be this notion that the M1 was something magical compared to what came before - a feat that Apple has never been able to duplicate. Don't get me wrong, Firestorm was a great core. But when placed in its context, M1 and Firestorm was simply the first time Apple could showcase its massive efficiency advantage while also competing in performance with AMD and Intel. It was a natural progression in the evolution of Apple's design which Apple has continued to iterate on. This may have come as a massive shock to the PC world who had largely ignored what Apple was doing over the last decade, but was much less to those who had payed attention. Since you're on at AT forums, you must've read Andrei's preview of the M1 where he laid this out (and he was much more explicit in other comments he made to this effect). I hope I've also done a good job of illustrating that while Apple has seemingly slowed down in some respects, this has been vastly overstated and the M1 was not some outlier either in antecedents or descendents.
================
Now there is some possibility that Apple will hit a wall with its designs. Eventually going wider and wider may yield diminishing returns as most code will simply lack the instruction level parallelism to justify it. Apple may also suffer a design crisis of some other kind. I can't know the future. But again I hope that I've demonstrated that, so far, the "great slowdown" due to "brain drain" is more myth than reality - nor can Apple be accused of resting on its laurels.
I mean, if memory serves me, even M3 Pro GPU beats 890M in TFLOPS (which of course is only part of the GPU performance story, but it represents a good indicator of raw processing power I assume).
Agree, but as far as I know, this is the best iGPU AMD currently offers, thus this is a direct apples-to-apples 😀 comparison of Mx GPU (which is part of SoC, thus an iGPU) with the relevant SotA, as opposed to a comparison with a discrete GPU.Isn't 890M more of a competitor to base M-series rather than Mx Pro? It has only 1280 shading units, which is similar to 1280 found in M3 and M4. AMD clocks its GPU almost twice as high though.
Agree, but as far as I know, this is the best iGPU AMD currently offers, thus this is a direct apples-to-apples 😀 comparison of Mx GPU (which is part of SoC, thus an iGPU) with the relevant SotA, as opposed to a comparison with a discrete GPU.
Of course, another way to view Mx GPU progress is how much Apple has closed the gap with, let's say, 3080, while using a fraction of power. For me, this level of performance is impressive.
M4 GPU 10C is almost there, I guess from M5 onwards we will see >= 5 TFLOPS at the base Mx models.I think it would be great for us consumers if Appel could deliver usable 5-6 TFLOPS in the base M-series Macs.
Hmmmm ... probably early morning late night tired thoughts, but you know since Iso-clock performance is really already taken as a terminology and it doesn't quite aptly describe what I'm going for here since it means same-clock performance, probably a more direct analog to IPC would simply be PPC (Performance per Clock, yes I know PPC is already taken too) or Clock-Normalized Performance (CNP). But the traditional IPC acronym (instructions per clock) really is also the wrong term to describe what people usually compare, especially when comparing across architectures with obviously different instruction sets. So maybe in the future I'll use PPC or CPN.Yes, I know that marketing departments do that all the time but "IPC" as weighted geometric of a bunch of incredibly disparate sub-benchmarks is meaningless. Also it's supposed to be "instructions per clock" but ARM and x86 have different instructions which means IPC isn't particularly comparable between them. Since what people really mean is performance-per-clock and the term IPC has been so polluted from its original definition, I prefer to use the term ICP (ISO-clock performance).
Amazing post, very informative.
I would like also to add to the discussion, which focuses mainly on CPU performance and efficiency, how impressive has been the GPU progress in the M-series. Have you seen any iGPU from AMD, Intel, or Qualcomm to provide this kind of performance at this power levels as, for example, M4 Max? I mean, if memory serves me, even M3 Pro GPU beats 890M in TFLOPS (which of course is only part of the GPU performance story, but it represents a good indicator of raw processing power I assume).
So much wrong here...So in only 10 years we will have negative future sizes. Macs will produce more power than they consume.
Seriously. Each new process requires an new and MUCH more expensive production line and at some point the cost of replacing the line is no longer worth it. Over the last few years Apple has been getting a free ride, they just wait while chips get smaller. Apple does not need to reinvent much. They are basically running the 1970s vintage BSD Unix on a smaller and cheaper computer. The overall idea has been unchanged for 50+ years. Maybe someday AI will push us away from that model. But on the other hand, the same OS has survived and is older than many people reading this.
Feature size reduction can only continue for so long. In reality "zero" is a hard limit and atoms have finite size.
And here comes the Strix Halo chips:Hmmmm ... probably early morning late night tired thoughts, but you know since Iso-clock performance is really already taken as a terminology and it doesn't quite aptly describe what I'm going for here since it means same-clock performance, probably a more direct analog to IPC would simply be PPC (Performance per Clock, yes I know PPC is already taken too) or Clock-Normalized Performance (CNP). But the traditional IPC acronym (instructions per clock) really is also the wrong term to describe what people usually compare, especially when comparing across architectures with obviously different instruction sets. So maybe in the future I'll use PPC or CPN.
Thanks so much much! Indeed Apple's GPUs are impressive. I think for iGPUs to compare with the Max we'll have to wait and see what next year brings. Strix Halo is going to have a beefy iGPU (though I think slightly less so than the Max, but for many games that won't matter since, you know the vast majority are native Windows and so forth). The Qualcomm Elite V2 is going to rumored to split the product line in two, one base M-like and one more Max/Pro-like with a much improved GPU over what shipped in V1. And Nvidia is rumored to be releasing its own ARM SOC for consumer PCs later in the year but no word on the size as far as I know. They have a line of discrete mobile GPUs, the Max-Q line, that are structured more similarly to Apple's M-series GPUs. That design integrated directly into an SOC, could be quite good.
What will also be interesting is that when Apple announced the Pro and Max line of chips for the M1, pundits fell over themselves to say that only Apple could afford to build an SOC this big and make the economics work - that for AMD, Intel, etc ... this strategy would never be profitable to them because they can only sell the SOC not the whole device. Since your typical chip maker has to make a profit selling the SOC and the device maker has to make a profit on the device, the end cost simply wouldn't work for the average non-Mac PC consumer who care a lot less than Mac users about battery life and quiet operations. Since multiple chip makers are going for it anyway, we'll see if that ends up being true. I'm sure they've all got some business plan that claim it'll work, but plans don't always come true. But then again ... pundit predictions about what can never be have a somewhat worse track record.
V-Ray and Corona for AS are CPU renderers.And here comes the Strix Halo chips:
![]()
AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores
Disruption is a daily thing.www.tomshardware.com
They are indeed comparing it to M4 Pro GPU rather than Maxes and the most impressive results is compared to the cut down M4 Pro GPU and VRAY is a bit of on outlier. I wonder how optimized it is for Macs relative to Blender/Redshift (Cinebench). Both VRAY and Corona are made by the same developer I believe.
EDIT: I'm not sure if those are all supposed to be GPU tests or if some of they are mixing CPU and GPU tests together in the same chart. Confusing.
Yeah I figured that out. In fact they are ALL CPU benchmarks for the Mac as far as I can tell. I had naively thought that with the new integrated GPU that's what they would've shown off even when comparing against the Mac. Do you know how well optimized V-Ray and Corona are for AS? The link I found in my edit to the post above talk about Macs, including AS Macs, needing SSE compatibility for V-Ray which makes me think it isn't very AS optimized and may not even be native?V-Ray and Corona for AS are CPU renderers.
Maybe someone has time to look into whether they used the current versions of the V-Ray and Corona renderer Benchmark apps which are here:Yeah I figured that out. In fact they are ALL CPU benchmarks for the Mac as far as I can tell. I had naively thought that with the new integrated GPU that's what they would've shown off even when comparing against the Mac. Do you know how well optimized V-Ray and Corona are for AS? The link I found in my edit to the post above talk about Macs, including AS Macs, needing SSE compatibility for V-Ray which makes me think it isn't very AS optimized and may not even be native?
======
Fascinating that the GPU is on the IO die. Again I wonder what process that is using if true since on the Desktop anyway the IO die was on an older N6 process. In fact I don't know if any of the process nodes for any of the dies in the Strix Halo or Fire Range processors has yet been confirmed.
EDIT: the Strix Halo SOC die (with I/O, NPU, and GPU appears to be 5nm - unsure which specific 5nm node):
![]()
AMD Debuts Ryzen AI Max Series "Strix Halo" SoC: up to 16 "Zen 5" cores, Massive iGPU
AMD at the 2025 International CES debuted the Ryzen AI Max 300 series of mobile processors. These chips are designed to go up against the Apple M4 Pro, or the chip that powers the Apple MacBook Pro. The idea behind it is to provide leadership CPU and graphics performance from a single package...www.techpowerup.com
Based on what I could fin online, I think Corona and V-Ray are AS native. Not sure how well optimized they are for it though.Maybe someone has time to look into whether they used the current versions of the V-Ray and Corona renderer Benchmark apps which are here:
![]()
V-Ray 6 Benchmark updated
Give your hardware a workout and see how it responds with looping tests, RTX and CUDA comparisons, a new benchmark scene, and support for Apple silicon.www.chaos.com
![]()
Corona Benchmark | Chaos
Test and compare the performance of your CPU using this free benchmark application, which is built upon Chaos Corona 10.www.chaos.com
(The full renderer apps include support for Distributed / Network rendering under MacOS out to NVidia GPUs in Windows and Linux workstations, so maybe that’s why… SSE stuff)
Good, the competition will only benefit Mac users. If Apple really did move their best CPU team to work on servers for a while as has been rumored they'd better get hiring some more people as replacements.Looks like Mac Studio has competition from Nvidia Project Digits with Grace 20-core ARM CPU, Blackwell GPU, 128GB LPDDR5X unified memory, ConnectX fabric, 4TB storage, running Linux, etc. for $3000.
https://www.nvidia.com/en-us/project-digits/
https://nvidianews.nvidia.com/news/...ry-desk-and-at-every-ai-developers-fingertips
![]()
Not quite sure how big exactly the GPU is from the "1 Petaflop of FP4 compute". I know they're marketing it for AI purposes, but I'd be very curious about its standard FP32 TFLOPs with that much RAM.Good, the competition will only benefit Mac users. If Apple really did move their best CPU team to work on servers for a while as has been rumored they'd better get hiring some more people as replacements.
"In addition, using NVIDIA ConnectX® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models." $6,000 for that is not too bad at all.
The 3 nm process also enables Arm to push out higher clock speeds on the Cortex X925 core, up to 3.8 GHz, to be exact.