Confirmed: Apple's A7 is 1.3 GHz dual core CPU, quad cluster GPU. Samsung 28nm HKMG.

Discussion in 'iPhone' started by chrmjenkins, Sep 11, 2013.

  1. chrmjenkins, Sep 11, 2013
    Last edited: Sep 27, 2013

    chrmjenkins macrumors 603


    Oct 29, 2007
    Previous thread title (for posterity and honesty about previous prediction): Analysis: Apple's A7 is quad core CPU, quad cluster GPU built on TSMC 28nm process

    About a week ago, I posted an Apple A-series SoC history thread and A7 prediction that you can find here.

    I'll avoid boring you with all of the details of that (lengthy) thread and say that I predicted A7 would be a modified version of the A6 CPU cores with a new "Rogue" GPU from ImgTec built on Samsung's 28nm process. I wasn't alone in that prediction-- Brian Klug and Anand Shimpi from Anandtech also figured it would go that way. If you read Anandtech's live blog, you could see that they were quite shocked that A7 was announced to be 64-bit, even with the rumors coming out last week. So was I.

    What We Learned from the Keynote

    So here's what we know from today's keynote: The Apple A7 is slightly larger than the A6 at 102 mm^2 and 96 mm^2 respectively. We also know that it has roughly double the transistors (around 1 billion), and this is according to a direct quote from Phil Schiller during the keynote. I think it's safe to assume the factor is at least 1.8x to make that kind of claim. We also know that they're claiming 2X CPU and 2X GPU performance from A6 to A7.

    What We Know about Apple's Current Designs and Practices

    First, if the die sizes are roughly the same, how are they getting twice the amount of transistors in there? We do know a few things about Apple A-series SoC designs. They've been getting progressively more custom from the A4 to the A6, with the A6 having fully hand designed CPU cores. This is opposed to the "place and route" approach where companies allow a CAD tool to automatically floorplan their device based on their functional description of the processor and the use of standard library blocks. Going full custom allows you to get denser because you're manually designing, but it also takes a lot more time, which is why almost no one does it. There was also a significant amount of analog circuitry redesign in the single core A5 seen in the latest AppleTV revision that let the die size decrease by almost 50%, yet the missing A5 CPU core did play a big part in this.

    64 Bit-ter is Bigger is Better

    It's likely that this custom CPU core design and custom analog circuitry design has a part to play. However, a 64 bit CPU core design is guaranteed to be larger than a 32 bit core design because you're increasing the width of your data path and execution units. Apple also said that they doubled the number of general purpose and floating point registers. This gels with ARM's Aarch64 64 bit standard architecture that goes along with the new ARMv8 64 bit instruction set architecture (ISA). The L1 and L2 cache sizes are also likely larger.

    While we can't know if Apple used the reference design (As they did in A4 and A5, but not A6), we know they probably had incentive not to, since ARM has been accused or server aspirations with this architecture, and AMD has announced CPUs built on ARM 64 bit parts doing just that. ARM's A15 was similarly criticized for being unnecessarily big, which is why Qualcomm did their own custom "Krait" implementation of the ARMv7s ISA (which Krait and Swift implement) to achieve a better power/performance ratio for mobile devices. That strategy is also a big reason why they have most of the major design wins in North America.

    Performance Increase Claims

    Now that we've established the individual CPU cores have to be bigger and Apple is probably saving themselves a little on custom analog design that they honed with their A5 AppleTV revision, let's talk performance numbers. If you go back to previous keynotes, Apple loves to make CPU and GPU claims with integer multiples. The good thing is that they're not just marketing fluff. The backed up their A5 -> A6 2x increase with actual benchmarks showing it was true. With GPUs, it's been even easier. When they claimed a 9x improvement from A4 to A5, it was a direct ratio of the FLOPs (floating point operations per second) rating of their GPUs. Because of that, we know the GPU should have twice the FLOPs of the A6 GPU. Lacking benchmarks for the A7 CPU, we'll have to dig a little deeper.

    GPU Improvements

    The GPU claim leads us to some easy conclusions. Since the FLOPs rating has to be 2x, we can do some easy math. Apple also proudly announced OpenGL ES 3.0 compliance. This throws out ImgTec series 5 GPUs that they've been using since the 3GS. It's also highly likely that they'll stay with ImgTec since they are a 10% stake owner in the company. Given the A6 GPU rating of 34.6 GFLOPs, we know we have to get to 69.2 GFLOPs. There's two options to logically do that. The first is ImgTec's G6200/G6230 "dual-cluster" GPU, which would need to be clocked at about 540 MHz to hit 69.2 GFLOPs. The second is ImgTec's G6400/G6430 "quad-cluster" GPU, which would need half the clock rate at 270 MHz since it has double the execution units. There is a "hex-cluster" option, but I'm dismissing that as too big.

    Given that the only announced Rogue products have their GPU frequencies in the 200-300 MHz range, I'm assuming Apple will use the G6430, since their GPU and CPU clock speeds tend to lag other high-end offerings in favor of saving power with larger designs. ImgTec baselines their Rogue GPUs at 600 MHz, but they lack of announced products at this frequencies makes it seem more a far-off goal. ImgTec has some wild claims about the Rogue architecture, one of which is that it is 20x more efficient than previous cores. In any event, it seems logical that we are at worst spending 2x the transistors to get 2x the FLOPs. On the A6, the CPU and GPU cores made up 33% of the die area. We'll get back to that.

    CPU Improvements

    Going back to the CPU claims, we can dig into the 2x claim by examining the relative performance of ARM's stock Cortex line cores. Dhrystone is a benchmark used on CPU cores to measure their performance. ARM gives their core performances in DMIPs/MHz, which is Dhyrystone Millions of Instructions per second per megahertz. It's basically a measure of instructions executed per cycle (IPC). The stock A9 core was (at least) 2.5 DMIPs/MHz, which Apple used at 800 MHz in the A5 found in the 4S. The stock A15 core is (at least) 3.5 DMIPs/MHz. Apple did not use this in the A6, but their implementation had a similar number of pipeline stages to Krait (11 vs 12). Because of this, they are assumed to have roughly the same IPC. Krait variants range from 3.3 to 3.4 DMIPs/MHz. Let's assume 3.4 to make it stronger. Keep in mind these are "theoretical" numbers that don't happen in reality. But since we comparing two theoretical numbers, it's a fair assumption it may be close to the actual performance ratios too.

    So, 3.4 DMIPs/MHz divided by 2.5 DMIPs/MHz gives us a factor of 36% faster, clock for clock. However, the A6 CPU speed is 1300 MHz versus 800 in the A5, giving us a factor of 62.5% faster. If we multiply the 1.36 by the 1.625, we get 2.21. Pretty close to the 2X apple claimed for A5 to A6 (they actually showed up to 2.1 in their bench). Looks like the 3.3 to 3.4 DMIPs/MHz is pretty close for A6. We'll use 3.3 to be fairer for the 2x jump burden for A7.

    ARM's A57 64 bit core is listed at (least) 4.1 DMIPs/MHz. Dividing that by 3.3 DMIPs/MHz gives us a 1.24 factor. To get to 2x, we need a 62% clock increase. That would give us a 2.1GHz clock speed. I find this unlikely because Apple has historically trailed their competitors in raw clock speed. Where competitor SoCs are 1.5 to 1.8 GHz (now 2.3 GHz with Krait 800), Apple has opted for lower clock speeds because their batteries have been much smaller. The iphone battery is anywhere from 33% to 50% smaller than android competitors. It gets away with this by having a smaller display, better power management and lower clock speeds.

    So if Apple's A7 isn't 2.1 Ghz, how does it get to 2x? It can either have a more sophisticated core (ARM claims A57 variants can be up to 4.76 DMIPs/MHz), or more cores. Since Apple's custom A6 was below the stock A15 in IPC, I'm assuming A7 is too. That gives us triple or quad core. I am assuming triple is out because no one has done a CPU with an asymmetric amount of cores. Apple has had references to quad cores show up in iOS betas, which likely means they've been testing them for a while. That's why I've come to the conclusion that Apple has finally made the jump to quad core. This also helps us get to the goal of 2x transistors, too.

    New Components, Increased Complexity and Transistor Density

    Knowing that the quad cluster GPU may be 2x the GPU transistors and the CPU is at least 2x transistors given a 64 bit core is going to be more complicated, shouldn't we be over an overall 2x increase in transistors? Not really. Remember that 33% number? That's how much of the die space the CPU & GPU took up on the A6. While the transistor density per unit area is not uniform for the die at all, it stands to reason that these parts being the main source of transistors doubling is what we would need to take us to an overall 2x figure. We are helped by the fact that there's many things that don't have to increase in complexity, like I/O interfaces and memory controllers if the A7 is staying with a 64 bit memory interface (the iPhone A-series SoCs have had 64 bit memory interfaces for a while. 128 would be more complex, which the X series do, but they don't have memory inside the package like the iphone parts do. We know from leaked 5S PCB that A7 does). There's also a fair chance some circuitry has moved off of the A7 SoC and into the M7 chip since many of the phone sensors now longer directly require the A7 to function.

    The Die Shrinkage

    Ok, maybe you buy all of that. How do we get 2x the transistors in roughly the same space (96 vs 102 mm^2)? The first is by a die shrink. The A6 is manufactured by Samsung on a 32nm process. General news about their fabs lead us to believe that 28nm is ready now. However, this will only get us a 20% density increase at the most optimistic estimate (you can't scale dimensions linearly or by the square either because 32nm vs 28nm referes to one dimension of a transistor, with the other not necessarily scaling linearly. Also, 32nm is a "full" node and 28nm is a "half" node. The simple answer from that is that those don't scale linearly either). Even with a massive custom circuitry undertaking and a 20% density increase from process change, the 2x factor still seems unlikely. 20nm isn't ready for any fab Apple could use either, so that's out of the question.

    Changing Foundries

    So, how do we get the rest? TSMC. TSMC is known for having denser processes at the same feature size. This can easily be seen by comparing standard ARM cores and their die sizes across processes. TSMC is noted for having a 20% or better density efficiency. So, if we compound the 20% density improvements, we get to about 1.5x. This is about as best as we can do with simple heuristics. We don't know how much custom circuitry apple will do to further improve density. It would be overly laborious and likely fruitless to try and weigh die share versus circuitry density (CPU and GPU) to get an overall idea. In either event, it seems obvious that the move to 28nm and TSMC are both necessary to get the claimed 2x transistor density.


    But Apple won't use TSMC until A8 you say! Well, the A7 leak had a new chip letter identifier that suggested a different fab . When macrumors consulted chipworks about this change, Chipworks suggested that it meant the chip was TSMC. So, that seems enough of a smoking gun to me.

    note: The amount of RAM is expected to be the same, and also from Elpida based on the picture of the A7 die (as noted by macrumors). I expect them to change to LPDDR3 from LPDDR2 however, as all high performance mobile SoCs are doing these days.
  2. GimmeSlack12, Sep 11, 2013
    Last edited by a moderator: Sep 16, 2013

    GimmeSlack12 macrumors 603


    Apr 29, 2005
    San Francisco
  3. chrmjenkins, Sep 11, 2013
    Last edited by a moderator: Sep 16, 2013

    chrmjenkins thread starter macrumors 603


    Oct 29, 2007
    Topic title IS the TL;DR :)
  4. user-name-here macrumors 65816

    Aug 31, 2013
    Pretty disrespectful to say that to someone simply trying to impart their knowledge on the forum to enhance the community :mad:

    If it's too much for your brain to handle the information in the OP then simply move on to another thread. I personally really enjoy Mr. Jenkins posts so please keep quiet with the "TL; DR" from now on.

    As always Mr. Jenkins, thanks again for taking the time to write the great information (same with your previous A7 thread last week). Will make the time waiting until the 20th go by faster trying to digest it :)
  5. taedouni, Sep 11, 2013
    Last edited by a moderator: Sep 11, 2013

    taedouni macrumors 65816

    Jun 7, 2011
    Thanks for the detailed post. I enjoyed reading it. I actually read the other post that you spoke of.
  6. cg399 macrumors member


    Sep 3, 2012
    Hurghada, Egypt
    Thank You!

    Can't pretend to have understood everything, but a very interesting read.

    Thanks for the insight - gave me a new respect for what they have packed into this A7 chip!
  7. iAlphard macrumors regular

    Aug 29, 2012
    Judging from the juice that powered the 5S, i think the A7 is the same as A6, which is dual core. They just added 64 bit extension into the A7.
  8. KenAFSPC macrumors 6502a

    Sep 12, 2012
    The extra registers might net you an extra 5 percent on many tasks. Maybe 15 percent on a few tasks. But they wouldn't get you anywhere close to twice the performance.

    I concur with chemjenkins on his analysis, although I wouldn't rule out a significant increase in clock speed, given the improved battery and potential power savings from an IGZO screen and/or "GRAM" like display buffer.
  9. JaySoul macrumors 68030


    Jan 30, 2008
    Honestly, the A7/M7 announcement was the biggest WOW moment of the entire event.

    Thanks OP for illuminating and whatnot.

    Apple definitely delivered on the 'S' part this year.
  10. chrmjenkins, Sep 11, 2013
    Last edited: Sep 11, 2013

    chrmjenkins thread starter macrumors 603


    Oct 29, 2007
    We usually get tipped off by the supply chain on display changes. That didn't happen this time, so those options seem unlikely. GRAM by itself is very modest gains.

    As to registers, @jonSt0kes had some very good tweets on the topic today concerning increased power consumption, compiler optimization and the like. He was basically explaining why 64 bit in and of itself isn't some great performance panacea.

    By the way, this OP is a very "back-of-the-envelope" analysis. There's a lot of die size and custom vs non custom info I don't have, but the history of Apple's claims, apple's claims today and what is known about the foundries out there provide enough meat to chew on for an educated guess.

    Edit: and pay attention to branding. Phil Schiller repeatedly referred to A7 as a "desktop" class processor. All apple desktop computers (save entry mini) feature quad core or higher CPUs. Also gives a hint to where Apple is headed with custom CPUs.
  11. Mrbobb macrumors 601

    Aug 27, 2012
    Well if you need all the nitty-gritty.

    For most people here it's enough to rejoice Apple gave us a faster processor without sacrificing energy requirement nor requires a large battery (Galaxy?). To me that's the essence of the 64 bit announcement.

    Vendors have built wide-path processors, well known to gamers, Nvidia, what's their super-duper GPU are these days, 256 bits? but they tend to BURN (temperature) and want a beefier power supply from your desktop.
  12. robbieduncan Moderator emeritus


    Jul 24, 2002
    The post makes logical and mathematical sense. If we assume this analysis is correct it leaves on question: why did Apple not announce that the 5S has a quad-core CPU to further differentiate it from the 5C?
  13. PaulOBrain, Sep 11, 2013
    Last edited by a moderator: Sep 16, 2013

    PaulOBrain macrumors regular

    Aug 23, 2013
    Aww someone can't read more than a few lines? Time to go back to the kiddy books.

    Great post OP very informative.
  14. KenAFSPC macrumors 6502a

    Sep 12, 2012
    That would detract from the sales pitch for the 5C, which Apple is pushing to new customers / markets.

    Sales rep:
    "The Apple 5S has a quad core CPU, but that is out of your price range. But you can get a quad-core Android phone for the same price as the dual-core iPhone 5C."
  15. borgqueenx macrumors 65816

    Jul 16, 2010
    Thanks for the post:) will you make another after they teared it open and show pictures and facts of whats inside?
  16. ABC5S Suspended


    Sep 10, 2013
    Thank you for your post. You took some time in writing this.

    Bottom line is, this is a nice upgrade for those that are looking for one, and I can almost bet, some android users will be thinking of a change to the iphone 5S. Time will tell, but they will have the faster CPU more powerful GPU, the fingerprint reader, not to mention the camera to compare with their smartphone. ;)
  17. Aussi3 macrumors 6502


    Jun 3, 2012
    Facesticks on the App store
    Thank you again for a good read! :) great post, very informative
  18. chrmjenkins thread starter macrumors 603


    Oct 29, 2007
    Apple never revealed A6 was dual core. So they don't always spout core count.

    Yes, we should have two reveals. The IFixIt teardown on launch day won't help with any of the claims here. Day 1 benchmark runs will. We'll have to wait a little bit for ChipWorks to do a die scan, but they'll be able to identify number of cores.
  19. i5pro macrumors regular

    Jun 17, 2010
    Wow..thanks for the info. So how is 64-bit better on the 5S? Is it just faster? Since it's so fast, can it finally do Flash?
  20. cube macrumors G5

    May 10, 2004
  21. masands macrumors regular

    Sep 17, 2010
    Hey, can you post this on The Verge ;Apple forum).

    If you don't do it , I will and claim it for myself!

    You will get a bigger audience there and this is a fantastic post!
  22. chrmjenkins thread starter macrumors 603


    Oct 29, 2007
    Feel free. Just source it.

    As many will point out, 64 bit isn't inherently better. The reason things improve is because the move to 64 bit also usually comes with an increase in execution resources. For those instructions that do happen to be able to take advantage of doubled-width operands, it's a big deal. But those types of instructions are not the majority.

    Not sure if serious.
  23. TommyA6 macrumors 65816

    May 15, 2013
  24. lulla01 macrumors 68020


    Jul 13, 2007
    That's a very big summary thanks for taking the time
  25. chrmjenkins, Sep 11, 2013
    Last edited: Sep 11, 2013

    chrmjenkins thread starter macrumors 603


    Oct 29, 2007
    First 5S GPU benchmark has been uploaded. Roughly 2X of iPhone 5.

    Brian Klug noticed someone uploaded a graphics bench from 5S.

    GFXBench 2.5 Egypt HD C24Z16 - Offscreen (1080p) : 56 FPS
    GFXBench 2.5 Egypt HD C24Z16 - Onscreen : 53 FPS

    Compared to iPhone 5:
    GFXBench 2.5 Egypt HD C24Z16 - Offscreen (1080p) : 29.8 FPS
    GFXBench 2.5 Egypt HD C24Z16 - Onscreen : 41.1 FPS

Share This Page