Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Nice post with great details. I enjoyed reading it. Thanks for sharing.
 
"32 bit backwards compatibility is built into ARMv8"
This is a legal fiction. It is not something inherent in the design of the ARMv8 ISA which, in fact, almost goes out of its way to be very different from the ARMv7 ISA.
There are other legal fictions in ARM world. Do we believe that Swift bothers to implement Jazelle, or PAE, or the hypervisor stuff?
Apple is not like any other ARM manufacturer. They don't have to adhere to the standard "ARM contract" between manufacturer and developer, they can define their own contract. And if that contract says "After July 2014 any app that does not include a 64-bit binary will be removed from the app store", so be it; just like if the contract says "we've now added these new 256-bit SIMD instructions", again so be it.

Everyone analyzing this is looking at it from some sort of weird 1995 perspective, where the resource in short supply is transistors, and the optimal design shares as much as possible between the 64-bit processing and the 32-bit processing. But it is NOT 1995 --- we have a billion transistors to play with. The resource in short supply is ENGINEERING TIME. It is far easier to design a pure 64-bit core (with some, possibly slow, transition logic to transfer control to a 32-bit companion core) than it is to design a 64-bit core that can ALSO execute the 32-bit ISA.

We're not talking here about creating the sort of weirdness that x86 seems to thrive upon: "what if I want to jump from 64-bit code to a plugin that is running THUMB16 code?, what if I want to share pointers between 64-bit and 32-bit code? etc". We're talking iOS --- an extremely controlled execution environment. The only transitions that are necessary are 64-bit user -> 64-bit OS->32-bit user and back again, and these are not going to happen frequently --- worst case really at around tick frequency.

Let me put it this way: Ben Bajarin's source says it is dual core. That doesn't jive with the heterogeneous scenario you propose.

edit: And soon developers won't have to care

You can submit 64-bit apps for iOS 7 today that take advantage of the power of iPhone 5s. Xcode can build your app with both 32-bit and 64-bit binaries included so it works across all devices running iOS 7. If you wish to continue to support iOS 6 then you will need to build for 32-bit only. Next month we will be making changes that will allow you create a single app binary that supports 32-bit on iOS 6, as well as 32-bit and 64-bit on iOS 7.
 
Last edited:
Curious to see what you think of this analysis: http://seekingalpha.com/article/1694892-intel-and-apple-an-interesting-tidbit

Author makes a good case that the A7 is done in Intel, with a 22nm process.

The Ivy Bridge HE-4 is made on the Intel (INTC) 22nm Trigate process and has 1.4 billion transistors with a chip size of 160 sq. mm, so a billion transistor logic chip on 22nm should be around 114 sq. mm. "Hand packing" the A7 design might give another 10% size improvement, which would make the A7 chip almost exactly 102 sq. mm. on the Intel 22 nm process.
 
Curious to see what you think of this analysis: http://seekingalpha.com/article/1694892-intel-and-apple-an-interesting-tidbit

Author makes a good case that the A7 is done in Intel, with a 22nm process.

It's a large leap from Intel offering their fab capacity to third parties for low volume products to a large volume product that they could potentially hope to win someday with an Atom product. Intel doesn't need the fab business to bolster a failing business model. They're doing very well.

That being said, Intel 22nm is not necessary either.

In the comments of that article, someone points out how dense GPUs can be (1.5billion in about 120 mm^2). The Xbox One APU fits 5 billion transistors in 360 mm^2. There are plenty of examples of higher density chips out there than 1 billion transistors in 102 mm^2. Yes, the design will be custom and tightly packed, but 28nm is still doable.
 
Anandtech guesses Samsung 28nm HK+MG process, fairly certain of 1.3 GHz, sure of two cores.

anyone know what "HK+MG" is?
 
Anandtech guesses Samsung 28nm HK+MG process, fairly certain of 1.3 GHz, sure of two cores.

anyone know what "HK+MG" is?

It stands for high-k metal gate. When geometries started getting really small they had to change the dielectric material in transistor gates for the electric field to still have an effect while not succumbing to excessive quantum tunneling because the dimensions are too small.

I'll be making a longer post after my daughter goes to bed on Anand's results.
 
Looks like I never posted my itemized prediction from the other thread in this thread. I'll post it below, quoted. It includes my pre sept 10 event prediction and my revised prediction after the event. After the quoted post, I'll update with the latest known form the anandtech review and explain why I got some stuff wrong, and what's still left to be known yet.

Time to update prediction from this:
A7
  • Manufacturer - Samsung on HKMG 28nm process
  • Die Size - 90-120 mm2
  • Designer - Apple
  • CPU Type - 1.3-1.6GHz Dual Core "Second Generation Swift Core"
  • Instruction Set - ARMv7s
  • Chip Designator - S5L8960X
  • L1 Cache - 32/32KB
  • L2 Cache - 1MB
  • RAM - 1GB LPDDR3 @ 1333 MHz (64 bit interface, PoP)
  • Max Theoretical Memory Bandwidth - 10.6 GB/s
  • GPU Type - "Quad Cluster" PowerVR 6430 @ 400 MHz
  • GPU Performance - 102.4 GFlops, 233 MTriangles/s

Bold is confirmed by Apple. Italics is changed from above prediction.

A7
  • Manufacturer - TSMC on HKMG HPM 28nm process
  • Die Size - 102 mm2
  • Transistors - approximately 1 billion
  • Designer - Apple
  • CPU Type - 1.5-1.8GHz Dual Core "Second Generation Swift 64-bit Core"
  • Instruction Set - ARMv8 with custom extensions
  • Chip Designator - S5L8960X
  • L1 Cache - 48/32KB
  • L2 Cache - 2MB
  • RAM - 1GB LPDDR3 @ 1333 MHz (64 bit interface, PoP)
  • Max Theoretical Memory Bandwidth - 10.6 GB/s
  • GPU Type - "Quad Cluster" PowerVR 6430 @ 270 MHz
  • GPU Performance - 69.2 GFlops, 157 MTriangles/s

See my explanation for these prediction changes over here: https://forums.macrumors.com/threads/1634100/

edit: the first GPU benchmark from 5S has been uploaded, supporting 2x claim.

Brian Klug noticed someone uploaded a graphics bench from 5S.

https://twitter.com/nerdtalker/status/377847764300099586
http://gfxbench.com/device.jsp?benchmark=gfx27&D=Apple+iPhone+5S&testgroup=overall

GFXBench 2.5 Egypt HD C24Z16 - Offscreen (1080p) : 56 FPS
GFXBench 2.5 Egypt HD C24Z16 - Onscreen : 53 FPS

Compared to iPhone 5:
GFXBench 2.5 Egypt HD C24Z16 - Offscreen (1080p) : 29.8 FPS
GFXBench 2.5 Egypt HD C24Z16 - Onscreen : 41.1 FPS

Updated prediction based on anandtech benches and review (bold now means confirmed by Apple or Anandtech):

  • Manufacturer - Samsung on HKMG 28nm process
  • Die Size - 102 mm2
  • Transistors - approximately 1 billion
  • Designer - Apple
  • CPU Type - 1.3GHz Dual Core "Cyclone" 64-bit
  • Instruction Set - ARMv8
  • Chip Designator - S5L8960X
  • L1 Cache - 64/64KB
  • L2 Cache - 1MB
  • RAM - 1GB LPDDR3 @ 1333 MHz (64 bit interface, PoP)
  • Max Theoretical Memory Bandwidth - 10.6 GB/s
  • GPU Type - "Quad Cluster" PowerVR 6430 @ 270 MHz
  • GPU Performance - 69.2 GFlops, 157 MTriangles/s

The first thing I got wrong according to Anandtech is the foundry. Anand Shimpi believes it's still Samsung, whereas I predicted TSMC. I still think this one is up in the air given the mystery identifier on the leaked A7 PCB that doesn't match Samsung naming. We'll see.

The second thing I got wrong was the core speed and core count. My original prediction before the event was a dual core still based on the ARMv7-A ISA like Swift was. I predicted a modestly higher clock at about 1.5 GHz for moderately higher performance. After the event gave me the die size and number of transistors along with claimed performance improvements, I assumed that quad core was needed for their claimed transistor count and performance metric. 2x would require a very high IPC bump without much room for a clock bump. I then revised back to dual core after analyst Ben Bajarin informed me that he had a source saying dual core for sure. For all I know, Anand was the source :)

My updated cache prediction was based on what we know about the stock A57 core from ARM. It looks like Apple opted to go a little higher and do a full 64KB instruction and data cache. Anand notes that it's a little slower to access L1 cache now, but the hit rate (successful finds of desired data) has improved because of the larger cache size. L2 cache latency has greatly improved, as has latency to main memory. He also saw up to 2x memory bandwidth in some synthetic cases. This huge improvement to the memory front-end is not all that surprising because it's one of the main improvements Qualcomm implemented in between Krait revisions, which Swift was very similar to.

The design is also out-of-order as we've had since the Cortex A9 in the Apple A5. Branch prediction is up too. All these improvements help IPC.

He also notes that the RAM is indeed 1GB as predicted, and that's it's also very likely LPDDR3 as predicted. Speed is unknown as of yet, but the silkscreen markings on the A7 package from the teardown will confirm that for us.

Getting to the benchmarks, it's pretty clear even 32 bit binaries benefit from the beefed up execution resources too. The A7 allows 32 and 64 bit binaries to run side-by-side transparently, which will make it easier on developers. Anand also notes that the increase in register count alone gave up to 10% performance boost on the x86 64 bit transition because of reduced data pressure.

Many of the benchmarks hit the claimed 2x performance metric set by Apple. It's clear that the improved memory subsystem, improved core execution units, increased register count and move to 64 bit binaries have indeed increased the IPC to a realm that hits near Apple's line in the sand. It's very impressive when you consider that it's still only dual core and it's still only 1.3 GHz.

On clock speed, Anand notes that device manufacturers are clocking ARM cores higher than they intended, and they're paying for it in applied core voltages. They're compensating with much larger batteries, too.

It's interesting to note that Anand says no one in the mobile space has done variable clock frequency quite right yet, so it will be interesting to see where Apple goes in the future there. I'm also anxious to see if they've updated the power management IC (PMIC).

The GPU looks like a spot on hit. It was kind of easy to guess given it was likely to be a stock choice on Apple's part and there were realistically only 2 choices given reasonable clock speeds. I favored larger and slower given Apple's history and that turned out to be right.

For transistor density, Anand notes that there's clearly some design process overhaul leading to them packing transistors so tightly. The large G6430, which is inherently more dense because of its resource breakdown compared to 5XT GPUs, helps them achieve their density increase. The slightly larger caches help ever so slightly too.

For the 1 billion transistor claim, Brian Klug from anandtech thinks they're counting M7 too. I don't because they announced this number before talking about the M7 at all. I think there's likely a significant amount of resources spent on the secure enclave for TouchID and also for a dedicated buffer/image signal processor to handle the advanced 5S camera functions.

All in all, A7 is a very impressive update and even iPhone 5 users will feel the speed increase in their everyday tasks.

Next we need the teardown to possibly confirm manufacturer and to confirm the RAM type. After a chipworks die shot, we'll know foundry for sure and figure out how all of those transistors are being spent.
 
I found this to be the most telling statement about the new CPU in Anand's excellent review:

At its launch event Apple claimed the A7 offered desktop class CPU performance. If it really is performance competitive with Bay Trail, I think that statement is a fair one to make. We're not talking about Haswell or even Ivy Bridge levels of desktop performance, but rather something close to mobile Core 2 Duo class.

We are talking about a phone with the same sort of CPU power as a laptop from only a few years ago. That is truly amazing.
 
We are talking about a phone with the same sort of CPU power as a laptop from only a few years ago. That is truly amazing.

Fully agree but for me the most exciting prospect is where Apple can go from here. The A7 in the iPhone 5 is limited by power and heat dissipation limitations as well as just outright physical space. Net result is a SoC that's making the 'right' compromises in terms of design to achieve its performance and that surely leaves a lot of overhead to gain performance once those limitations ease.

First interesting reference will be an A7X in the iPad but if they do go to a larger screen size next year in the iPhone that'd give them a much bigger battery to play with too. An A7 (well, A8 at that point) quad core CPU design, possibly with higher clocks to boot? Really easy way to get another 'up to 2x' speed boost from the SoC.

Still think Apple may push the boat out with the A7X this year and position the iPad as a full-on pro machine. Looking at relative performance levels surely an A7 variant with, say, quad core CPU clocked up a bit along with say 2Gb of RAM would be able to manage an acceptable implementation of Aperture and the like...
 
Looks like I never posted my itemized prediction from the other thread in this thread. I'll post it below, quoted. It includes my pre sept 10 event prediction and my revised prediction after the event. After the quoted post, I'll update with the latest known form the anandtech review and explain why I got some stuff wrong, and what's still left to be known yet.



Updated prediction based on anandtech benches and review (bold now means confirmed by Apple or Anandtech):

  • Manufacturer - Samsung on HKMG 28nm process
  • Die Size - 102 mm2
  • Transistors - approximately 1 billion
  • Designer - Apple
  • CPU Type - 1.3GHz Dual Core "Cyclone" 64-bit
  • Instruction Set - ARMv8
  • Chip Designator - S5L8960X
  • L1 Cache - 64/64KB
  • L2 Cache - 1MB
  • RAM - 1GB LPDDR3 @ 1333 MHz (64 bit interface, PoP)
  • Max Theoretical Memory Bandwidth - 10.6 GB/s
  • GPU Type - "Quad Cluster" PowerVR 6430 @ 270 MHz
  • GPU Performance - 69.2 GFlops, 157 MTriangles/s

The first thing I got wrong according to Anandtech is the foundry. Anand Shimpi believes it's still Samsung, whereas I predicted TSMC. I still think this one is up in the air given the mystery identifier on the leaked A7 PCB that doesn't match Samsung naming. We'll see.

The second thing I got wrong was the core speed and core count. My original prediction before the event was a dual core still based on the ARMv7-A ISA like Swift was. I predicted a modestly higher clock at about 1.5 GHz for moderately higher performance. After the event gave me the die size and number of transistors along with claimed performance improvements, I assumed that quad core was needed for their claimed transistor count and performance metric. 2x would require a very high IPC bump without much room for a clock bump. I then revised back to dual core after analyst Ben Bajarin informed me that he had a source saying dual core for sure. For all I know, Anand was the source :)

My updated cache prediction was based on what we know about the stock A57 core from ARM. It looks like Apple opted to go a little higher and do a full 64KB instruction and data cache. Anand notes that it's a little slower to access L1 cache now, but the hit rate (successful finds of desired data) has improved because of the larger cache size. L2 cache latency has greatly improved, as has latency to main memory. He also saw up to 2x memory bandwidth in some synthetic cases. This huge improvement to the memory front-end is not all that surprising because it's one of the main improvements Qualcomm implemented in between Krait revisions, which Swift was very similar to.

The design is also out-of-order as we've had since the Cortex A9 in the Apple A5. Branch prediction is up too. All these improvements help IPC.

He also notes that the RAM is indeed 1GB as predicted, and that's it's also very likely LPDDR3 as predicted. Speed is unknown as of yet, but the silkscreen markings on the A7 package from the teardown will confirm that for us.

Getting to the benchmarks, it's pretty clear even 32 bit binaries benefit from the beefed up execution resources too. The A7 allows 32 and 64 bit binaries to run side-by-side transparently, which will make it easier on developers. Anand also notes that the increase in register count alone gave up to 10% performance boost on the x86 64 bit transition because of reduced data pressure.

Many of the benchmarks hit the claimed 2x performance metric set by Apple. It's clear that the improved memory subsystem, improved core execution units, increased register count and move to 64 bit binaries have indeed increased the IPC to a realm that hits near Apple's line in the sand. It's very impressive when you consider that it's still only dual core and it's still only 1.3 GHz.

On clock speed, Anand notes that device manufacturers are clocking ARM cores higher than they intended, and they're paying for it in applied core voltages. They're compensating with much larger batteries, too.

It's interesting to note that Anand says no one in the mobile space has done variable clock frequency quite right yet, so it will be interesting to see where Apple goes in the future there. I'm also anxious to see if they've updated the power management IC (PMIC).

The GPU looks like a spot on hit. It was kind of easy to guess given it was likely to be a stock choice on Apple's part and there were realistically only 2 choices given reasonable clock speeds. I favored larger and slower given Apple's history and that turned out to be right.

For transistor density, Anand notes that there's clearly some design process overhaul leading to them packing transistors so tightly. The large G6430, which is inherently more dense because of its resource breakdown compared to 5XT GPUs, helps them achieve their density increase. The slightly larger caches help ever so slightly too.

For the 1 billion transistor claim, Brian Klug from anandtech thinks they're counting M7 too. I don't because they announced this number before talking about the M7 at all. I think there's likely a significant amount of resources spent on the secure enclave for TouchID and also for a dedicated buffer/image signal processor to handle the advanced 5S camera functions.

All in all, A7 is a very impressive update and even iPhone 5 users will feel the speed increase in their everyday tasks.

Next we need the teardown to possibly confirm manufacturer and to confirm the RAM type. After a chipworks die shot, we'll know foundry for sure and figure out how all of those transistors are being spent.
I thought were at the point where specs don't really matter it just needs to work!
 
The iFixIt teardown is in progress right now. We've learned a few things: The baseband is still 9615M from Qualcomm. The transceiver got an upgrade to the WTR1605L so it can do TD-SCDMA (china mobile).

The audio codec, apple/cirrus amp chip and/or PMIC are all new parts. I don't think the codec has been identified yet because the two Apple named parts shown so far look way too big.

Also, the silkscreen code starts with "N" like all other previous A-series chips. This is different than the leaked PCB which started with "K". That must have been a test part of some sort, if not a TSMC part for test purposes as well. Looks like they indeed stayed with Samsung this time.

It also appears that the "M7" is actually built into the A7. Not a separate chip.

edit: Ha. iFixIt referenced my post for the RAM size.

edit 2: For reference, here's the leaked A7 image from June:
iphone_5S_chip.jpg


Here's the one from the teardown:
PcgFf2OGYhSrZ3xj.huge


You can see they were made 37 weeks apart. Almost 9 months. Samsung would have had very limited 28nm capacity at that point (risk production) so it's possible we're looking at a TSMC test chip.

edit 3: iOS 7 now supports digital audio stream out for the iphone via the camera connection kit like the iPad does. Unfortunately, 24b/96KHz audio is still streaming only. I would say this kills the idea of an improved codec in the 5s.
 
Last edited:
The chip works die piture tells of some strange blocks like the size of the dual core Cyclone cores. These are huge compared to any A15 core. The SRAM block is also intriguing. Could this be implemented similar to Xbox One eDram that does caches for cpu and gpu as well. By the looks of it, that could run from 4MB to maybe 6MB. It is a darn shame that Apple keeps their high-level architecture such a secret. At least they ought to have released the older chips like the A6 swift now that they have jumped a generation forward.
Many are really keen to see the actual performance of the A7 if this could be properly tested in the near future. Early indications looks very impressive indeed!.:D
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.