Previous thread title (for posterity and honesty about previous prediction): Analysis: Apple's A7 is quad core CPU, quad cluster GPU built on TSMC 28nm process About a week ago, I posted an Apple A-series SoC history thread and A7 prediction that you can find here. I'll avoid boring you with all of the details of that (lengthy) thread and say that I predicted A7 would be a modified version of the A6 CPU cores with a new "Rogue" GPU from ImgTec built on Samsung's 28nm process. I wasn't alone in that prediction-- Brian Klug and Anand Shimpi from Anandtech also figured it would go that way. If you read Anandtech's live blog, you could see that they were quite shocked that A7 was announced to be 64-bit, even with the rumors coming out last week. So was I. What We Learned from the Keynote So here's what we know from today's keynote: The Apple A7 is slightly larger than the A6 at 102 mm^2 and 96 mm^2 respectively. We also know that it has roughly double the transistors (around 1 billion), and this is according to a direct quote from Phil Schiller during the keynote. I think it's safe to assume the factor is at least 1.8x to make that kind of claim. We also know that they're claiming 2X CPU and 2X GPU performance from A6 to A7. What We Know about Apple's Current Designs and Practices First, if the die sizes are roughly the same, how are they getting twice the amount of transistors in there? We do know a few things about Apple A-series SoC designs. They've been getting progressively more custom from the A4 to the A6, with the A6 having fully hand designed CPU cores. This is opposed to the "place and route" approach where companies allow a CAD tool to automatically floorplan their device based on their functional description of the processor and the use of standard library blocks. Going full custom allows you to get denser because you're manually designing, but it also takes a lot more time, which is why almost no one does it. There was also a significant amount of analog circuitry redesign in the single core A5 seen in the latest AppleTV revision that let the die size decrease by almost 50%, yet the missing A5 CPU core did play a big part in this. 64 Bit-ter is Bigger is Better It's likely that this custom CPU core design and custom analog circuitry design has a part to play. However, a 64 bit CPU core design is guaranteed to be larger than a 32 bit core design because you're increasing the width of your data path and execution units. Apple also said that they doubled the number of general purpose and floating point registers. This gels with ARM's Aarch64 64 bit standard architecture that goes along with the new ARMv8 64 bit instruction set architecture (ISA). The L1 and L2 cache sizes are also likely larger. While we can't know if Apple used the reference design (As they did in A4 and A5, but not A6), we know they probably had incentive not to, since ARM has been accused or server aspirations with this architecture, and AMD has announced CPUs built on ARM 64 bit parts doing just that. ARM's A15 was similarly criticized for being unnecessarily big, which is why Qualcomm did their own custom "Krait" implementation of the ARMv7s ISA (which Krait and Swift implement) to achieve a better power/performance ratio for mobile devices. That strategy is also a big reason why they have most of the major design wins in North America. Performance Increase Claims Now that we've established the individual CPU cores have to be bigger and Apple is probably saving themselves a little on custom analog design that they honed with their A5 AppleTV revision, let's talk performance numbers. If you go back to previous keynotes, Apple loves to make CPU and GPU claims with integer multiples. The good thing is that they're not just marketing fluff. The backed up their A5 -> A6 2x increase with actual benchmarks showing it was true. With GPUs, it's been even easier. When they claimed a 9x improvement from A4 to A5, it was a direct ratio of the FLOPs (floating point operations per second) rating of their GPUs. Because of that, we know the GPU should have twice the FLOPs of the A6 GPU. Lacking benchmarks for the A7 CPU, we'll have to dig a little deeper. GPU Improvements The GPU claim leads us to some easy conclusions. Since the FLOPs rating has to be 2x, we can do some easy math. Apple also proudly announced OpenGL ES 3.0 compliance. This throws out ImgTec series 5 GPUs that they've been using since the 3GS. It's also highly likely that they'll stay with ImgTec since they are a 10% stake owner in the company. Given the A6 GPU rating of 34.6 GFLOPs, we know we have to get to 69.2 GFLOPs. There's two options to logically do that. The first is ImgTec's G6200/G6230 "dual-cluster" GPU, which would need to be clocked at about 540 MHz to hit 69.2 GFLOPs. The second is ImgTec's G6400/G6430 "quad-cluster" GPU, which would need half the clock rate at 270 MHz since it has double the execution units. There is a "hex-cluster" option, but I'm dismissing that as too big. Given that the only announced Rogue products have their GPU frequencies in the 200-300 MHz range, I'm assuming Apple will use the G6430, since their GPU and CPU clock speeds tend to lag other high-end offerings in favor of saving power with larger designs. ImgTec baselines their Rogue GPUs at 600 MHz, but they lack of announced products at this frequencies makes it seem more a far-off goal. ImgTec has some wild claims about the Rogue architecture, one of which is that it is 20x more efficient than previous cores. In any event, it seems logical that we are at worst spending 2x the transistors to get 2x the FLOPs. On the A6, the CPU and GPU cores made up 33% of the die area. We'll get back to that. CPU Improvements Going back to the CPU claims, we can dig into the 2x claim by examining the relative performance of ARM's stock Cortex line cores. Dhrystone is a benchmark used on CPU cores to measure their performance. ARM gives their core performances in DMIPs/MHz, which is Dhyrystone Millions of Instructions per second per megahertz. It's basically a measure of instructions executed per cycle (IPC). The stock A9 core was (at least) 2.5 DMIPs/MHz, which Apple used at 800 MHz in the A5 found in the 4S. The stock A15 core is (at least) 3.5 DMIPs/MHz. Apple did not use this in the A6, but their implementation had a similar number of pipeline stages to Krait (11 vs 12). Because of this, they are assumed to have roughly the same IPC. Krait variants range from 3.3 to 3.4 DMIPs/MHz. Let's assume 3.4 to make it stronger. Keep in mind these are "theoretical" numbers that don't happen in reality. But since we comparing two theoretical numbers, it's a fair assumption it may be close to the actual performance ratios too. So, 3.4 DMIPs/MHz divided by 2.5 DMIPs/MHz gives us a factor of 36% faster, clock for clock. However, the A6 CPU speed is 1300 MHz versus 800 in the A5, giving us a factor of 62.5% faster. If we multiply the 1.36 by the 1.625, we get 2.21. Pretty close to the 2X apple claimed for A5 to A6 (they actually showed up to 2.1 in their bench). Looks like the 3.3 to 3.4 DMIPs/MHz is pretty close for A6. We'll use 3.3 to be fairer for the 2x jump burden for A7. ARM's A57 64 bit core is listed at (least) 4.1 DMIPs/MHz. Dividing that by 3.3 DMIPs/MHz gives us a 1.24 factor. To get to 2x, we need a 62% clock increase. That would give us a 2.1GHz clock speed. I find this unlikely because Apple has historically trailed their competitors in raw clock speed. Where competitor SoCs are 1.5 to 1.8 GHz (now 2.3 GHz with Krait 800), Apple has opted for lower clock speeds because their batteries have been much smaller. The iphone battery is anywhere from 33% to 50% smaller than android competitors. It gets away with this by having a smaller display, better power management and lower clock speeds. So if Apple's A7 isn't 2.1 Ghz, how does it get to 2x? It can either have a more sophisticated core (ARM claims A57 variants can be up to 4.76 DMIPs/MHz), or more cores. Since Apple's custom A6 was below the stock A15 in IPC, I'm assuming A7 is too. That gives us triple or quad core. I am assuming triple is out because no one has done a CPU with an asymmetric amount of cores. Apple has had references to quad cores show up in iOS betas, which likely means they've been testing them for a while. That's why I've come to the conclusion that Apple has finally made the jump to quad core. This also helps us get to the goal of 2x transistors, too. New Components, Increased Complexity and Transistor Density Knowing that the quad cluster GPU may be 2x the GPU transistors and the CPU is at least 2x transistors given a 64 bit core is going to be more complicated, shouldn't we be over an overall 2x increase in transistors? Not really. Remember that 33% number? That's how much of the die space the CPU & GPU took up on the A6. While the transistor density per unit area is not uniform for the die at all, it stands to reason that these parts being the main source of transistors doubling is what we would need to take us to an overall 2x figure. We are helped by the fact that there's many things that don't have to increase in complexity, like I/O interfaces and memory controllers if the A7 is staying with a 64 bit memory interface (the iPhone A-series SoCs have had 64 bit memory interfaces for a while. 128 would be more complex, which the X series do, but they don't have memory inside the package like the iphone parts do. We know from leaked 5S PCB that A7 does). There's also a fair chance some circuitry has moved off of the A7 SoC and into the M7 chip since many of the phone sensors now longer directly require the A7 to function. The Die Shrinkage Ok, maybe you buy all of that. How do we get 2x the transistors in roughly the same space (96 vs 102 mm^2)? The first is by a die shrink. The A6 is manufactured by Samsung on a 32nm process. General news about their fabs lead us to believe that 28nm is ready now. However, this will only get us a 20% density increase at the most optimistic estimate (you can't scale dimensions linearly or by the square either because 32nm vs 28nm referes to one dimension of a transistor, with the other not necessarily scaling linearly. Also, 32nm is a "full" node and 28nm is a "half" node. The simple answer from that is that those don't scale linearly either). Even with a massive custom circuitry undertaking and a 20% density increase from process change, the 2x factor still seems unlikely. 20nm isn't ready for any fab Apple could use either, so that's out of the question. Changing Foundries So, how do we get the rest? TSMC. TSMC is known for having denser processes at the same feature size. This can easily be seen by comparing standard ARM cores and their die sizes across processes. TSMC is noted for having a 20% or better density efficiency. So, if we compound the 20% density improvements, we get to about 1.5x. This is about as best as we can do with simple heuristics. We don't know how much custom circuitry apple will do to further improve density. It would be overly laborious and likely fruitless to try and weigh die share versus circuitry density (CPU and GPU) to get an overall idea. In either event, it seems obvious that the move to 28nm and TSMC are both necessary to get the claimed 2x transistor density. Summary But Apple won't use TSMC until A8 you say! Well, the A7 leak had a new chip letter identifier that suggested a different fab . When macrumors consulted chipworks about this change, Chipworks suggested that it meant the chip was TSMC. So, that seems enough of a smoking gun to me. note: The amount of RAM is expected to be the same, and also from Elpida based on the picture of the A7 die (as noted by macrumors). I expect them to change to LPDDR3 from LPDDR2 however, as all high performance mobile SoCs are doing these days.