Some notes on memory bandwidth of the Apple Silicon machines (which is already extremely high) vs the new Ryzen / Intel CPUs that were released a few days / weeks ago:
Ran a simple benchmark on a new Ryzen 7950x desktop build (64GB RAM) here in the lab (the build will be returned to the supplier) vs my M1 Max laptop (64GB RAM).
Task: Take about 10000 parquet files (10.6GB total) and append them into 1 dataframe (> 400 million observations) in memory.
Hypothesis: The Ryzen 7950x should be way faster - at first thought - because it has 16 cores (versus 8 M1 Max performance cores) that are also clocked way higher.
Result: They are equally as fast because the Ryzen CPU is bottlenecked by memory bandwidth (very fast cores but just 2 memory channels on the CPU).
The files:
The task is most efficiently done in parallel using all cores available, used both polars (Rust ) and pandas-modin (C++) to do this as fast as possible.
When using all 8 performance cores on my M1 Max, memory bandwidth to CPU is at about 120 GB/s (theoretical max is 200Gb/s).
Yet the Ryzen 7950x can do 81 GB/s memory bandwidth at most as the memory runs at 5200MT/s (* 8 bytes * 2 memory channels)/1024 = 81.25 GB/s (you can stretch this to about 100 GB/s if you heavily overclock). Thus despite the 7950x's 16 faster cores it's as fast as my M1 Max with 10 cores in this task because about 6 Ryzen cores are enough to reach that 81GB/s of bandwidth. The other 10 cores are starved from input and just idling.
This is not new; others have ran similar tasks with similar results. E.g.
https://tlkh.dev/benchmarking-the-apple-m1-max who finds that
"... adding more cores on the 5600X does not help (2 cores are enough to maximize memory bandwidth), while 10 cores on the M1 Max is the optimal configuration".
The M1 Ultra has 20 cores and 400GB/s of memory bandwidth and thus runs way faster than the Ryzen 7950X as none of its 20 cores are starved. This is even more so when the Ryzen 7950X is decked out with 128GB of DDR5 RAM instead (4 DIMM slots) and therefore runs at a slower 3600 MT/s instead which is a meager 56.25 GB/s memory bandwidth. 5 Ryzen cores can fully consume that; the other 11 cores will just idle.
This is also iterated at
http://hrtapps.com/blogs/20220427/ which similarly highlights the importance of memory bandwidth (in computational fluid dynamics in this case) and finds that:
"M1 Ultra has an extremely high 800 GB/sec memory bandwidth.... which leads to a level of CPU performance scaling that I don’t even see on supercomputers, and is the result of a SoC (system on a chip) design"
The new Intel Raptor Lake CPUs also only have 2 memory channels and top out at about 120GB/s max memory bandwidth (heavily overclockded) as well so there won't be a difference here.
So just a heads up: the new Ryzen/Intel CPUs are good for gaming and workflows which are not so much memory dependent, but if you're doing data analysis or other scientific HPC work of some sort that is CPU-and-memory bound (thus not GPU machine learning) you'll very quickly run into memory bandwidth limits. Then you better stick to Apple's M1 / M2 chips or the AMD / Intel CPUs with more than 2 memory channels and thus more memory bandwidth (which are also way more expensive, e.g. the AMD ThreadRipper Pro 5965WX with 26 cores and 8 memory channels at 200GB/s memory bandwidth max for which you have to pay $2400 just for the chip itself and $1000 for a compatible motherboard).