For the Mac Pro, why not simply put a 96-core EPYC AMD CPU in it with a RTX 4090 (if Apple can solve their politics with NVIDIA) while retaining user expandability and repairability for the Mac Pro.
1. macOS is limited to 64 threads. Any x86 processors with SMT(Hyperthreads) > 32 cores is relatively non effective for macOS. Up to 64 cores could probably use firmware settings to permanently fuse off the SMT functionality but performance on a wide variety of workloads isn't going to do all that well.
Your question is off base. Where there "have to build a clone of AMD/Intel" servers chips is way off in the weeds of Apple has to build a CPU package that doesn't match their operating system? Apple is not likely at all to run off and spend giant sums of money making other people's operating systems better. M-series has a 'M' primarily because it is suppose to make macOS (and iPadOS) run better. That is the primary objective.
Severthehome is using dual Eypc/Xeon SP systems in the benchmark on this
"...
Still, we wanted to show why acceleration matters in a use case that was pertinent to us. As a result, we bootstrapped the nginx QAT acceleration to the 32-core 8462Ys, and then ran the full STH nginx stack with the database (and minus back-end tasks like backups/ replication and such) all on a single node and compared it to the AMD EPYC 9374F. Here is what we saw:
... "
Basically that accelerators can matter as much as core count. QAT off ... AMD wins. QAT on and Intel wins.
The accelerators that Apple attaches to the SoC matter. It is not always just simply a matter of "most cores wins".
Are the new SP gen 4 accelerators going to completely stop the server market share bled by Intel to Eypc Zen4? No. Will it significantly slow the bleed down in some key submarkets if Intel ships enough volume at a quick pace (and doesn't get too greedy on price) ? Probably yes.
Apple isn't trying to place a SoC in the generic server market. They are trying to place something in the single user workstation market. If the accelerators match the user workload then completely "missing the boat" if myopically just looking at CPU core counts. There is a reasonably high probably that Apple will selectively use the additional of high density logic accelerators over some key areas to offset the pressure to add more than 64 CPU cores to their systems. In fact, it wouldn't be surprising if Apple stayed down below 54 for several generations.
Everything "graphical output" that some folks try to load up CPU cores with, Apple will probably try pretty hard to push over the GPU cores ( where the macOS/Unix thread limit doesn't matter). AI/ML inference workloads ... AMX and NPU cores ... again where the macOS thread limit doesn't matter. Video de/encoding ... fix function; no macOS thread limits . Imaging analysis calculations ... to fixed function ; where no macOS thread thread limits.
There likely will not be very high overlap with the Intel Xeon SP accelerators. Doubtful Apple is going to do triple backflips to make Ethernet data traffic go faster. Database query accelerations ... probably not. I think Apple has some compression accelerators (besides video) already so there is some overlap. Encryption also.
But the general trend line is to put multiple dies into one package/socket. AMD and Intel are already well on that path. Apple has a "too chunky" chiplet with M1 generation , but that probably isn't a permanent miscue.
Since Mac Pro usually supports dual chips, Apple could even put a 192-core AMD CPU in it even.
The Mac Pro hasn't build 'new' support for dual CPU packages since 2009. [ 2010 and 2012 were largely just rehashes of the same logic board with minor differences.] So from 2006-2009 support two packages for 3 years . From 2013 - 2022 support for just one package 9 years. So the actively 'usually' is not particularly supports. Even if throw in a stale bone of 2010-2013 it is still less than half the Mac Pro era. ( if try to go back and drag in the PowerMac systems then digging an even deeper, duplicity hole there. )
Even in the sever space multiple package set ups are dying.
Dell product manager as 'guest' author at Next Platform in 2019. ( Sales numbers since have only confirmed this)
"... Dual socket servers are creating performance challenges due to Amdahl’s law – that little law that says the serial part of your problem will limit performance scaling. You see, as we moved to the many core era, that little NUMA link (non-uniform memory access) between the sockets has become a huge bottle neck – ... "
Multi socket servers have been around since the dawn of enterprise computing. The market and technology has matured into 2-socket x86 and multi core
www.nextplatform.com
similar notion, same author later that year.
https://www.dell.com/en-us/blog/4-new-reasons-to-consider-1-socket-server/
more recent article.
" ...
AWS is using
Graviton and Gravition2 single-socket machines in its vast EC2 server fleet. Google, as we have previously reported, is
rolling out its Tau instances on the Google Cloud based on a single-socket “Milan” Epyc 7003 server node to drive down costs while driving up performance. Significantly, Google says that a single-socket 60-core Epyc server can offer 56 percent higher performance and 42 percent better bang for the buck than Amazon Web Services can do with its single-socket Graviton2 nodes. ... "
One of the key strategic moves that AMD made when it architected its comeback in the datacenter was to beef up the compute, I/O, and memory on a single
www.nextplatform.com
Does Grativon 2/3 run 100% of all Amazon Web services workoads ? No. Do they run enough to pay for Graviton R&D and production? Yes. Running 100% of all the workloads is a 'fake" requirement. It is not required. The Mac Pro running 100% of all possible x86_64 workloads isn't a serious requirement either. Will Apple be able to sell everything to everybody? No. Is that an Apple requirement? No.
The next Mac Pro is extremely unlikely to support multiple CPU packages. Multiple chiplets of a single coherent Package Yes. Multiple full blown packages? Probably not.
Does Apple really believe the M2 Extreme would beat a 192-core AMD CPU and a RTX 4090? Heck, you can probably put multiple RTX 4090 in the Mac Pro even (if Apple solves their politics with NVIDIA).
Wrong question. Real question is does Apple absolutely need a server chip and 4090 killer to create a reasonably good single users workstation? No. The misdirection here is that Apple 'has to' have a 4090 killer. That would be a 'nice to have' , but it isn't absolutely necessary. There will be tons of workstations sold even in the x86 space that don't have a 4090 in them. A decent number will , but a substantially larger number won't. In the single user workstation space Threadripper Pro will likely outsell Eypc.
For laptops, I get it. ARM offers nice battery life, but a Mac Pro has no battery life.
That is goofy notion. Back to the ''why single server servers " article from 2019.
"... To fix this we need more pins and faster SERDES (PCIe Gen4/5, DDR5, Gen-Z), 1-socket enables us to make those pin trade-offs at the socket and system level. .."
Abnormally high power consumption is part of the reason Intel is 'stuck' higher socket consumption problems. And yes even datacenters have a power budget. The long distance , ever faster SERDES tend to soak up power. Which leads to heat , which tends to migrate toward the CPU core's logic. The thermal management system will at some point chop the frequency of the cores to try get back on track with the thermal budget and ..... performance will go down.
For a single user workstation having the 12 , 24 , 48 SoCs having all the same, more than decently high, single threaded performance is more a feature than a "problem". Shouldn't "have to" buy alot more cores don't need if single threaded is all a user is looking for. Likewise should not have to start trading off single threaded performance (or giant gobs of money) if need a mix of single and multithread workloads.
Even if it is just the CPU cores if they all generate lots of heat then have the same issue the Mac Pro 2013 ran into with single (jointly used ) thermal transfer system.
The modern ARM instruction set has very little to do with "battery life". The Ampere Computer server process has stats of
"... Ampere Altra featuring 80 cores fabricated on TSMC's N7 process for hyperscale computing.
[32][33][34] It was the first server-grade processor to include 80 cores and the Q80-30 conserves power by running at 161 W in use.
[32] The cores are semi-custom
Arm Neoverse N1 cores with Ampere modifications.
[35] It supports a frequency of up to 3.3 GHz with TDP of 250 W, 8ch 72-bit
DDR4, up to 4 TB
DDR4-3200 per socket, 128x
PCIe 4.0 Lanes, 1 MB
L2 per core and 32 MB SLC.
[33][34] ..:"
en.wikipedia.org
250W is not going into anyone's reasonably portable laptop. [ both Intel and AMD have taken to slapping a laptop label on a desktop SoC for their super performance laptops chip meant to paired with an equally hot discrete GPU. So there is a whole class of 'plug it in the wall the vast majority of the time" laptops now. So a 250W laptop CPU isn't 'outrageous' anymore. It should be, not is it is not. ] Neither is 4TB . Neither is 128 PCI-e v4 lanes.
Arm is less constipated than x86_64. They aren't trying to drag around every instruction to try to clone Multics and run 16-bit DOS into the 21st century. Armis an instruction set that allows implementors to drop really old stuff. Apple habitually drops some really old stuff every 10-15 years so it is a better match in philosophy.