Article is not correct - it's either 15% speed gain OR 30% less power. Never both - people always seem to confuse this.
Not exactly.
What you are saying is true for an individual transistor, or a collection of identical transistors, but it isn't necessarily true for a design.
A good design will use a range of transistors from the fastest that are appropriate sitting in critical paths, to the lowest power that can still do the job, sitting in non-critical paths. The next version of the design can run its "fast" transistors 15% faster, its "low power transistors" at 30% lower power, and achieve gains on both axes. To the extent that the critical path transistors control most of performance, you can be 15% faster, while, to the extent that much (not, most, that would be overstating it) power is dissipated in the low power transistors, that power is down by 30%. You might be able to get something like device is 12% faster at 18% lower power.
.....
Even that understates the case because it assumes a pure port of an existing design. Smarter would be a revised design that plays to the strength of the new process; most likely insofar as it makes even more use of stored state (larger caches, branch predictors, etc). Even better (but much more difficult for an outsider to judge) is situations where the new process allows for "crossing a cliff" -- the way the power vs frequency played out with the old process, the maximum width of your renamer at the frequency you want was 8 wide, but the new process allows you to bump this to 10-wide without that either becoming an unbearable hot-spot, or forcing the clock too slow.
My guess (only a guess) goes as follows
- Apple seems to have a "complete" design cycle time of about 4 years. Meaning that the pattern to expect is something like
- first design, 3 iterations, next design, 3 iterations, ...
- we've been through two rounds of this (A7/8/9/10 6-wide, with constant refinement; then A11/12/13/14 now 8-wide, designed from the start as asymmetric with two core complexes, little core a reparameterization of the big core)
- in both cases the first member of the set was designed with all the most difficult parts scaled correctly, and then extra pieces filled in via refinement. So A7 was 6-wide, but subsequent versions added more FP pipelines, a second int multiplier and so on
- so my guess is that (assuming the cycle wasn't delayed a year because of all the addition complexities around the x86/mac transition) A15 will be a different design *at the lowest levels*. It may not superficially look different because only the bones have changed; it will be the A16 and later than add onto those bones. So, eg, width may be increased to 10-wide, but not the number of execution units. 4xNEON may be reconfigured to 2xSVE256, but we don't actually get more basic FMAC capability. Virtual registers are added -- but no additional physical registers yet. etc etc
- point is this is what I mean by redesign. The design looks like mostly unchanged. Same frequency, maybe 20% better IPC, main change looks like SVE and ARMv9 capabilities (MTE, BTI etc). BUT new design has exploited BOTH the 15% speed boost and the 30% lower power to rework all the most critical areas (rename, schedule) to operate more aggressively, and we then see that play out in further 15% IPC boost over the next three cores, which will occasionally take a process boost primarily in frequency.