Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Kotsos81

macrumors member
Original poster
Dec 26, 2023
36
29
In general, there is a lot of activity in the heterogeneous computing area lately. Even AMD seems to adopt this paradigm, with a combination of Zen4 and Zen4c cores. I find it a fascinating topic, in general.

I have noticed that some mobile SoCs, such as Samsung Exynos 2100, Qualcomm Snapdragon 888 and others, have adopted a heterogeneous computing architecture with three types of CPU cores: High-performance or prime cores, performance cores, and (energy-)efficiency cores. I assume that this approach offers greater flexibility than the conventional hybrid CPU paradigm with two types of cores (i.e., P-cores and E-cores) and provides some benefits for portable devices with even tighter energy consumption constraints than laptops, such as smartphones.

I was wondering, though, if this strategy makes also sense for laptops and if we may see it in future macs. (I am not aware of a laptop CPU that utilizes this method. Maybe there is a reason for that...).

So, what do you think? Could laptops benefit from hybrid CPUs with 3 types of cores? Do you believe that we will see such implementations in the future?
 
  • Like
Reactions: splifingate

bcortens

macrumors 65816
Aug 16, 2007
1,229
1,568
Ontario Canada
In general, there is a lot of activity in the heterogeneous computing area lately. Even AMD seems to adopt this paradigm, with a combination of Zen4 and Zen4c cores. I find it a fascinating topic, in general.

I have noticed that some mobile SoCs, such as Samsung Exynos 2100, Qualcomm Snapdragon 888 and others, have adopted a heterogeneous computing architecture with three types of CPU cores: High-performance or prime cores, performance cores, and (energy-)efficiency cores. I assume that this approach offers greater flexibility than the conventional hybrid CPU paradigm with two types of cores (i.e., P-cores and E-cores) and provides some benefits for portable devices with even tighter energy consumption constraints than laptops, such as smartphones.

I was wondering, though, if this strategy makes also sense for laptops and if we may see it in future macs. (I am not aware of a laptop CPU that utilizes this method. Maybe there is a reason for that...).

So, what do you think? Could laptops benefit from hybrid CPUs with 3 types of cores? Do you believe that we will see such implementations in the future?
Zen4 and Zen4c are really getting press because on the windows side of things it feels like the schedulers are still behind the times so having two cores that are identical other than clock speed is really helpful there. On macOS, iOS, and Android schedulers have been aware of different core types for a long time and having different core designs allows them to better tailor the core to it's intended role (performance vs efficiency).

Apple doesn't really do a small core the way Android does, and Intel's efficiency cores are nearly as transistor heavy as Apple's P-cores. Further, Apple's efficiency cores are usually far more potent than Android's smallest cores, and Apple often approaches but doesn't quite match the performance of Android mid cores while consuming far less power. Could Apple add another core tier? Maybe. I don't see much motivation at this time.

I think heterogenous computing is more interesting if we could get a more uniform programming toolchain to allow coding for either the CPU or GPU within the same programming language and even within the same file. A compiler flag to tell the compiler that the code should run on one of the GPU, CPU, or both depending on which the runtime thinks is better. That would be cool. However right now GPU cores are so different that it takes careful thinking about data structures and control flow to make sure you get the best use of the GPU.
 

Sydde

macrumors 68030
Aug 17, 2009
2,552
7,050
IOKWARDI
So, what do you think? Could laptops benefit from hybrid CPUs with 3 types of cores? Do you believe that we will see such implementations in the future?
As far as Apple goes, it is unlikely that they will go 3-tier. A lot of effort goes into Apple's small cores. Unlike the ARM 500 series E-cores, Apple's E-cores are out-of-order (which is better than in-order), yet they provide efficiency that is at least as good as 5xx alongside performance that rivals the 7xx mid-tier cores. It would make no sense for Apple to go with a third intermediate core, because it does not fit their use profile even for portables.
 

splifingate

macrumors 65816
Nov 27, 2013
1,255
1,054
ATL
I am not aware of a laptop CPU that utilizes this method

Intel Ultra is available now, I believe, and it's trinary (albeit in a P-core, E-core and LPE-core design):


The rPi 4B+ was my first big.LITTLE machine. I now use a M2 Max Studio . . . which is not--of course--exactly comparable ;)

Earlier this Winter, I sourced a fan-less soft router (directly from CWWK (Guangdong/Hong Kong area)).

It's based around an Alder Lake N305, with 8 Gracemont efficiency cores (8-core/8-thread).

The performance is phenomenal (3TB nvme/48GB DDR5/4x2.5G eth. Proxmox virtualization host) . . . all in a 12-25 watt envelope.

I was astounded to be able to do a Gentoo "emerge -auDN @world" in about the same amount of time that it last took on my 12-core/24-thread x5675 Dell T5500 🤷‍♂️

Of course, I am completely astounded that the M2 Max Studio does all that it does in roughly the same power envelope as the N305 E-core system.

I could fit both five times over in the power envelope of the T5500 (which is why I'll probably not be using it much any more) <smile>

All that being said, it's amazing how we can do so much more with less.
 
  • Like
Reactions: Kotsos81

leman

macrumors Core
Oct 14, 2008
19,237
19,135
I was wondering, though, if this strategy makes also sense for laptops and if we may see it in future macs. (I am not aware of a laptop CPU that utilizes this method. Maybe there is a reason for that...).

So, what do you think? Could laptops benefit from hybrid CPUs with 3 types of cores? Do you believe that we will see such implementations in the future?

I think this has more to do with the state of the art different companies bring to the table. Let's take Intel. Their P core is very fast, but also ver large and power hungry. Their E-core is compact and delivers reasonable performance per mm2 and watt, but it's not ultra-low-power either. It's good for reaching high compute threat put on easily parallelizable tasks, not really useful for conserving battery life (E-cores still draw around 10 watts, that's 2x more than Apple's performance cores!). So Intel have now introduced the new type of low-power core for running background tasks and conserve battery life. Android manufacturers follow similar strategy, except that they historically had good ultra-low-power cores and good medium-power/efficient cores, but lacked extreme performance cores. So what we see now happening in the Phone space is that one adds one or two ultra-fast (but power hungry) cores to get that extra single-treaded performance for tasks that benefit from it. AMD is interesting because Zen4c is more of a compact version of the same architecture, so it's less asymmetrical than others (as mentioned in #2 above).

Talking about Macs however, well, Apple's P-cores are already as fast as Intel's fastest, while being 2x smaller and consuming 5-6x less power. And their E-cores are ultra-low-power, while delivering some respectable performance (still not as fast as Intel's E-cores, but getting there). So Apple simply doesn't need the three-tier strategy, not yet at least. They can simply keep adding P-cores, which fulfill the same roles as P- and E-cores in Intel's designs. What we do see is some recent Apple design increasing the importance of E-cores, which as of M3 do offer decent compute throughput, especially when it comes to numerical processing.
 
  • Like
Reactions: meson and Kotsos81

jdb8167

macrumors 601
Nov 17, 2008
4,732
4,429
As far as Apple goes, it is unlikely that they will go 3-tier. A lot of effort goes into Apple's small cores. Unlike the ARM 500 series E-cores, Apple's E-cores are out-of-order (which is better than in-order), yet they provide efficiency that is at least as good as 5xx alongside performance that rivals the 7xx mid-tier cores. It would make no sense for Apple to go with a third intermediate core, because it does not fit their use profile even for portables.
I think the main reason that Apple has no need to have 3-tiers is that they already have a third tier. Apple's SoC business model is distinctly different because they have no need to sell chips to OEMs unlike the rest of the industry. So when Apple wants a very high efficiency/very low power core, they design a custom core (ASIC) and run a proprietary embedded OS (RTOS) on it along with whatever functional code that they need. All of this is designed and embedded on each SoC. This is just more transistor budget for their SoC (A/M-series). Intel/AMD/Qualcomm/Samsung can't easily do this with SoCs destined for OEM designs since all those designs have different needs and different OSes.
 

Sydde

macrumors 68030
Aug 17, 2009
2,552
7,050
IOKWARDI
(Apple's) E-cores are ultra-low-power, while delivering some respectable performance (still not as fast as Intel's E-cores, but getting there).
The article cited by @splifingate (or one that is linked from it) seems to suggest that is not really the case. The author says that when you set a thread to the lowest QoS, the AS E-cores run at around 0.780GHz, which is more than a hundred MHz slower than *Lake E-cores. It looks like they are at least as fast (probably faster) if a thread requests a QoS just below what triggers P-core dispatch.
 

leman

macrumors Core
Oct 14, 2008
19,237
19,135
The article cited by @splifingate (or one that is linked from it) seems to suggest that is not really the case. The author says that when you set a thread to the lowest QoS, the AS E-cores run at around 0.780GHz, which is more than a hundred MHz slower than *Lake E-cores. It looks like they are at least as fast (probably faster) if a thread requests a QoS just below what triggers P-core dispatch.

My understanding is that the cores themselves are still a bit smaller than Intel's (in terms of compute units) — although M3 added one more FP unit which probably puts it ahead. But maybe my knowledge is outdated. The newer Apple's E-cores do run at > 2Ghz, matching Intel's E-cores in sustained operation. Would be interesting to see some benchmarks. If Apple indeed manages to match Intel E-core performance with their 0.5watts per core power draw, that would be insanely impressive.
 

Kotsos81

macrumors member
Original poster
Dec 26, 2023
36
29
I found a recent series of articles that deal with evaluating the performance and understanding the behavior of M3 Pro 12C P-cores and E-cores and comparing the results and conclusions with those associated with M1 Pro 10C.

Part 1 is about general performance: M3 Pro General Performance
Part 2 provides a closer look on power and energy: M3 Pro Power and Energy
Part 3 presents the special CPU modes: M3 Pro Special CPU Modes
Part 4 focuses on vector processing in NEON: M3 Pro Vector Processing in NEON
Part 5 is concerned with the AMX co-processors: M3 Pro AMX
Summary: M3 Pro Evaluation Summary

The initial part on AMX was somewhat inconclusive, therefore a more detailed follow-up was added: M3 Pro AMX Follow-Up

There are also two other articles on the characteristics of the M3 CPU, one more general and another one more involved.

General M3 Characteristics: M3 Characteristics - General
Detailed M3 Characteristics: M3 Characteristics - Detailed

Lastly, there is an article comparing Accelerate performance on AS vs. Intel cores: Accelerate Performance - AS vs. Intel
I find those very interesting readings that validate the major advancements that AS made in just two years and provide useful insights.
 
  • Like
Reactions: Chuckeee

Sydde

macrumors 68030
Aug 17, 2009
2,552
7,050
IOKWARDI
My understanding is that the cores themselves are still a bit smaller than Intel's (in terms of compute units)
AIUI, e-cores have about half the ROB of p-cores and thus a much smaller rename pool. Presumably, keeping track of far fewer in-flight ops does not require as much logic, and the micro-ops themselves do not need as much tagging overhead. Perhaps the unified cache strategy in the new GPU will leak over into the CPU cores to gain even more efficiency.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.