The History of Apple SoCs
To know where we're going, we need to know where we came from. Prior to the A4, Apple sourced Samsung SoCs for the iPhone, iPhone 3G and iPhone 3GS. Let's take a look at Apple's custom SoCs.
A4
A4:
[10]
A5:
[9]
[8]
A6:
[7]
[6]
A7:
[8]
A8:
[16]
A8X:
[17]
Apple SoC family attributes:
A9 Prediction
A9
The purpose of this piece is to preview and predict the features of the iPhone 6S (and related iPads). If you have not read MacRumors rumor roundup, please go do so before reading this.
A Look Back to Last Year
This will mark the third year I've done an Apple SoC and device preview. I did an A7 prediction thread two years ago, and an A8 prediction last year, both of which you can find linked in the more reading section. For the A7, the prediction proved to be almost exactly on for the GPU, just narrowly missing on operating frequency but getting the configuration correct. It also correctly predicted the foundry and process, though that was not hard with the strength of the rumors suggesting Samsung's 28nm process. It also predicted the RAM type (LPDDR3) and size (1GB), but missed on frequency (1333 vs. 1600 MHz).
The A8 prediction was very close on the CPU, but missed significantly on the GPU, as the device ended up featuring four GPU "clusters" again instead of the predicted six. A major miss was the amount of RAM predicted. Prior to the A8 and iPhone 6, at least every other generation of iPhone had seen an increase in RAM capacity. The 10% shrink in die size compared to the A7 was also quite unexpected, especially given that the die roughly doubled in total number of transistors.
The post also accurately predicted the cellular radio and transceiver, which was greatly assisted by leaked PCBs that showed the 9625M modem. The modem necessitated the presence of the companion WTR1625 and WFR1620 chips, which were also located. PCB analysis also revealed the QFE1100 envelope tracker for dynamic RF chain power adjustment. The WiFi footprint had grown relative to previous generations, and the existence of an integrated Broadcom solution for wireless AC made it clear that a solution featuring the AC standard was likely.
Display
The display was predicted to change, in line with the strong rumors of both 4.7 inch and 5.5 inch displays. No quality improvements were predicted, mostly based on the fact that no rumors or analysis of leaks led us to believe the displays would be anything but a shift in size from the 4 inch screen, which had already improved color gamut and adopted in-cell touch assemblies over the original retina display. Apple made it a point to mention specific improvements to the display in their keynote, a primary example being dual domain pixels for improved color distortion performance on off-angles. Anandtech and DisplayMates's reviews of the iPhone 6 and 6 Plus displays revealed improved contrast, color accuracy and off-angle performance. To summarize, it was the best LCD display ever featured in a phone.
The use of sapphire over gorilla glass for the display was perhaps the hottest talking point surrounding the new iPhone's development, which was already a staple of the iPhones' camera lenses. There was much debate over whether they would be included, despite the strength of the rumors regarding their development. It was eventually revealed that gorilla glass was again in use, which made sense given the ultimate resolution of Apple's bankruptcy settlement with sapphire producer GT Advanced. It seems clear that Apple intended to use sapphire in their displays, but the plans succumbed to poor management at GT Advanced.
Flash Storage
NAND storage was an interesting topic because of schematic leaks that showed 16GB, 64GB and 128GB storage options, making for an easy prediction. This matched with flash makers' announcements of NAND densities sufficient to reach 128GB modules with eight die inside a single package. The confusing part of the schematic rumors definitely centered around a 1GB flash module, which was briefly mistaken for RAM. This 1GB NAND was never located on the board or inside the application processor package, so if the leak was legitimate, it would need to be housed inside the flash memory package with the other die. As to its purpose, I suggested at the time that it could be related to secure storage of fingerprint or health data. Given that this 1GB pool was never confirmed to exist, we'll have to relegate it to rumor once again.
Battery
Thanks to increased chassis size due to larger displays, larger capacity batteries seemed a given - a desire that had been in constant war with Apple's quest for thinner devices. There were several leaks featuring potential batteries for the new iPhones that turned out to be accurate. The iPhone 6 had 1810 mAh and 2100 mAh capacities rumored, whereas the iPhone 6 Plus had 2915 mAh. The smaller size turned out to be accurate for the iPhone 6, while the only 6 Plus leak was also accurate. Apple changed battery chemistry at the time of the iPhone 5 with much of the rest of the industry. The new chemistry allowed for more efficient discharge and perhaps enhanced battery lifetime, and there hasn't been any interesting battery rumors to cover since that point outside of capacity size.
CPU
CPU architecture is an area in which Apple had made tremendous strides in successive years prior to last year's A8. A6 was Apple's first full custom design, which was followed by an industry first 64-bit A7. The year over year 2x performance improvement from each of these was made possible by ISA and process technology advancements, along with a very generous helping of engineering talent. Last year, Apple had a full node transition from 28nm to 20nm at their disposal, but my CPU prediction was very muted based on the strength of the A7 design.
Apple's A7 was a huge advancement in mobile CPU architecture. With an issue width of six micro-ops, Apple was rivaling desktop processors. Nvidia's custom 64 bit core does have an issue width of seven, but we're likely reaching the practical limits of issue width for today's workloads. Skylake, Intel's latest microarchitecture, also features a dispatch of six micro-ops per cycle. A handful of smaller optimizations to the architecture were predicted, including things like improved ALUs, an improved memory architecture, and reduced latency for complex ALU operations. All of those improvements came to light through Anandtech's benchmarking in their A8 review. Apple also seems to have moved to independent L2 cache designs for each core, as noted by Chipworks. This makes sense, as the L3 last level cache is still shared.
One focus Apple emphasized was sustained performance on A8. Although performance gains were only up to 20% in most cases, they wanted to reduce the throttling becoming more common in mobile processors as thermal limits were reached. As Anandtech's review shows, they definitely achieved this goal. The review remarked how it was the first processor to be put through the benchmark suites that did not throttle during performance tests. This is pretty significant, as mobile SoC makers had been progressively pushing for more peak performance to advertise the gains for which consumers had developed an appetite.
GPU
The GPU prediction from last year had two main components, the first of which was another architecture upgrade. The A7 saw Apple adopt Imagination Technologies' "Rogue" Series 6 architecture just a little over a year after its announcement. In early 2014, ImgTec announced the XT series addition to the Rogue family. It had features like ATSC compression support, as well as finer DVFS options, dubbed 'PowerGear'. The XT series also increased FP16 throughput by 33%, effectively making it double that of the FP32 throughput. While FP16 vs. FP32 usefulness on mobile is a debate of its own, to end result was Apple claiming up to 50% speed improvements. Most benchmarks showed modest gains, with some cases pushing towards the 50% theoretical improvement.
The second reason a six cluster GPU was predicted was the full node transition from 28nm to 20nm. With mild changes to the CPU predicted, there would be plenty of extra transistors for GPU increases if the die size stayed relatively constant. Apple had also pushed GPU performance quite hard generation after generation. It turns out that transistor growth for a given cluster count increases quite dramatically going from the G6x30 series GPUs to their GX6x50 successors. In what could theoretically be a 50% increase in transistor density thanks to a whole node shift, the GPU only shrunk about 10% from A7 to A8. It's clear that the extra ALUs, compression hardware, and cluster power control circuits have a hefty transistor overhead.
RAM
The stalemate on application processor RAM was somewhat of a surprise. Prior to the iPhone 6 and 6 Plus, the iPhone had doubled RAM at least every other generation. We did see the iPad Air 2 gain 2GB of RAM with the A8X, but that move was to help support the split-screen multi-tasking of the full size iPad. Also, while there had been rumors of Apple adopting LPDDR4 for the A8, it seemed too early in the memory's product lifecycle for sufficient volume to be available to Apple, resulting in LPDDR3 being used again.
Regarding RAM, I'd like to call back to the A7 design when Chipworks examined the package. There was a large increase in pads, which suggested that we would see an 128 bit memory interface rather than the 64 bit memory interface that had become common on Apple processors. This large padout was again seen on the A8 package, and we still have no answer for the increased number of pads, as those processors feature 64-bit memory interfaces. Perhaps Chipworks has since discovered the answer and keeps it in one of their paid reports, but it's an interesting development to keep in the back of your mind as we move forward.
Audio
Device audio is another area of small but keen interest for some. Android has recently seen some adoption of 24-bit audio to cater to audiophiles, and suggestions that Apple would require artists to submit 24-bit masters to iTunes stoked the flames for some. Anandtech covers audio performance pretty well in their A8 review, giving the audio codec praise for its harmonic distortion and noise performance. This seems to be an area where we may see some activity, as some think there are some obvious gains to be had by improving audio bit depth.
Rather than enhancing the DAC or amplifier stages of the audio chain, I also theorized that Apple could leverage their MFi program to create hardware through their recently acquired Beats brand that featured built-in DACs and/or amps to market to the audiophile segment. This possibility did turn out to be a reality, but via third party Philips.
A8X
Last year, we also saw the re-introduction of an iPad specific processor via the A8X. After the A7 remained the only application processor design for the iPhone 5S, iPad Air, and Retina iPad mini, many had assumed that Apple had now deemed GPU performance sufficient for one SoC to power all the iDevices released in a given year. A8X proved that wrong, with a GPU double the size, a third CPU core, double the L2 cache, and double the RAM. The new multitasking features in iOS served as an impetus to have a high performance tablet version of the A8 SoC. In retrospect, the design effort that the A7 represented could be the reason it existed as the sole processor product of Apple that year. Apple was devising a huge architecture shift on a very new ISA in addition to a new GPU architecture adoption, so all design resources may have been required to hit their target goals. With the more modest architecture changes on A8, they could devote resources to developing the A8X on a similar timeframe.
One last thing to mention about the A8X is what the die shot shows. We know from diagnostics that the CPU cores have 2MB cache. If you grab the L2 cache from the A8 shot, you can superimpose it and see that there clearly appear to be blocks two times the size of the A8 L2 cache. However, there's only two of them despite there being three cores. What's going on there, an asymmetric cache structure where two cores share an L2? That seems unlikely and I'm probably missing something obvious, but it's an interesting difference to point out.
Fixed Function Blocks
I have saved the most interesting part of the A8 for last, and that centers around transistor density and die size. As soon as Apple announced the transistor count had approximately doubled from the A7 to A8, I began to imagine ways they could have incurred such transistor growth. An obvious candidate would have been the aforementioned predicted GPU size increase. SRAM size increase would be another obvious one, as their six transistor bit structures are easy ways to push up transistor count while also being very dense. That didn't turn out to be true, either, with L1, L2 and L3 caches all staying the same size.
The GPU did see its transistor budget increase quite a bit, but not by a considerably large amount, while the CPU increases were very minimal. This turns us to fixed function blocks. Apple's efforts in fixed function and mixed signal blocks has been an area of interest for some time. From the keynote, we know two significant fixed function blocks that would cause transistor growth. The first was a change to the ISP that allowed for 240 fps slo-mo video, doubling the previous 120 fps slo-mo. The second was what Apple referred to as a "desktop class" image scaler. With the introduction of the iPhone 6 and 6 Plus, graphical elements were no longer necessarily neat multiples of one another across generations, so Apple allowed the iPhone to downsample graphics drawn at a higher resolution to preserve nice pixel multiples.
These fixed function additions don't tell the whole story, but they begin to explain how Apple could have such large GPU growth without significantly increasing the main sources of transistors: GPU, CPU and SRAM. What is also very interesting is that the transistor density scaling they achieved. From an absolute ideal standpoint, a given design could scale to approximately half of its size on a full node shrink. However, not all circuit types scale the same way, and transistors don't always scale the same way in channel width as they do in length. Realistically, you are then looking at probably 70% scaling. However, going from 1 billion to 2 billion transistors (and perhaps rounding is on Apple's side here) means Apple is looking at a near 0.45 scale factor. With SRAM/cache being the best bet for increased transistor density, they did not change at all in memory capacity yet Apple achieved this scaling factor. Once again, Apple has some extremely talented engineers.
What You Should Have Read This Past Year
In this section, I'd like to highlight some reading from the past year that provides context for the state of the industry and technology at this time.
Anandtech's iPhone 6 Review - Anandtech continues to put out the best iDevice reviews even after losing Shimpi and Klug to Apple. They are the reason we have many of the microarchitecture details on Apple's custom SoCs.
Anandtech's iPad Air 2 Review - In addition to the above, this gives us a look at the A8X.
Displaymate iPhone 6 and 6 Plus Display Review - A critical look at display performance on the iPhones released last year.
Enhanced Sapphire - DisplayMate gives a short technology overview of a new type of Sapphire that beats glass in reflectance tests, which means better ambient light performance. All while still being more scratch resistant than glass.
Thin Loop Heat Pipe - Fujitsu has developed a heat pipe that dramatically improves cooling performance in a form factor suitable for phones. This isn't just about being able to use more power in a given thermal envelope. Lowering heat for normal CPU operations lowers leakage current due to heat and can have other benefits.
Light-splitting camera patent - Apple patented a system for splitting light into its individual color components to enhance image quality. They adapted a technology often seen in larger video capture devices to a mobile form factor.
Tile-based rendering overview - ImgTec explains how their graphics processors work using tile-based rendering. It's a proven power efficient rendering method, which is why it has found such a good home in mobile spaces.
Apple Watch System-In-Package - A great short blog entry that covers the packaging advance that the Apple Watch represents.
Semiconductors from idea to product - A great overview of the flow of semiconductor IP from idea to real silicon. Gives a great context to understanding feasibility and time-to-market concerning rumors.
An Introduction to Semiconductor Physics, Technology, and Industry - Great companion read to the link above. Gives a broad but detailed overview of the state of semiconductors, process flows, and challenges faced by the IC industry. If you only read two links, make it these two.
Inside ARM's Cortex-A72 microarchitecture - a dive into ARM's latest high performance 64 bit reference core. This will power many of the rival smartphones Apple will face and will serve as one of the comparison points in benchmarks.
A comparison of HBM, Wide IO, and Hybrid Memory Cube - an overview and comparison of the 2.5D and 3D memory technologies that will eventually take over for LPDDR memory in mobile devices.
Avago purchase of Broadcom - Avago and Broadcom both have IP in several generations of the iPhone. The acquisition could have implications for Apple and the industry as a whole.
Teardown of the iPhone 6 Plus battery - Nice overview of today's battery technology with a cost analysis thrown in.
Comparing OpenGL ES To Metal - A benchmark comparison of the standard OpenGL ES and Apple's proprietary metal drivers.
The IP licensing model - A follow-on to the IP development process from another ImgTec engineer. It covers the relationship between device IP licenser and licensee.
Understanding Qualcomm's Snapdragon 810 - While it covers the entire Qualcomm Snapdragon solution, the sections on the RF components are good, especially since the next iPhone will use some of them.
Understanding Qualcomm's ImproveTouch - On the eve of Force Touch displays, it's important to understand the competitors' technologies and solutions.
Galaxy Note 5 Display Shootout - Samsung now makes the best reviewed smartphone displays, so they are the benchmark for Apple.
Hunting Down a Turncoat - A very well written account of the former TSMC executive who became disenchanted with his employer over a skipped promotion and eventually gave his company's technology process secrets to Samsung.
To know where we're going, we need to know where we came from. Prior to the A4, Apple sourced Samsung SoCs for the iPhone, iPhone 3G and iPhone 3GS. Let's take a look at Apple's custom SoCs.
A4
A4:
[10]
- Manufacturer - Samsung on 45nm process (as featured in iPhone 4)
- Die Size - 53 mm2 [5]
- Designer - Apple (Intrinsity[3], also featured in Samsung's 'Hummingbird' SoC[4])
- CPU Type - 800MHz Cortex-A8 Core with customizations
- Core Count - 2
- Instruction Set - ARMv7
- Chip Designator - S5L8930X
- L1 Cache - 32/32KB (Instruction/Data)
- L2 Cache - 512KB
- RAM - 256MB LPDDR @ 400 MHz (64 bit interface, PoP)
- Max Theoretical Memory Bandwidth - 3.2 GB/s [1]
- GPU Type - Dual Core PowerVR SGX 535 @ 200 MHz
- GPU Performance - 1.6 GFlops, 14 MTriangles/s [2]
A5:
[9]
- Manufacturer - Samsung on 45nm process (as featured in iPhone 4S)
- Die Size - 122.2 mm2
- Designer - Apple
- CPU Type - 800MHz Cortex-A9 Core with customizations
- Core Count - 2
- Chip Designator - S5L8940X
- L1 Cache - 32/32KB (Instruction/Data)
- L2 Cache - 1MB
- RAM - 512MB LPDDR2 @ 800 MHz (64 bit interface, PoP)
- Max Theoretical Memory Bandwidth - 6.4 GB/s [1]
- GPU Type - Dual Core PowerVR SGX 543 @ 200 MHz
- GPU Performance - 14.4 GFlops, 70 MTriangles/s [2]
[8]
- Manufacturer - Samsung on 45nm process (as featured in 3rd generation iPad)
- Die Size - 165 mm2
- Designer - Apple
- CPU Type - 1GHz Cortex-A9 Core with customizations
- Core Count - 2
- Instruction Set - ARMv7
- Chip Designator - S5L8945X
- L1 Cache - 32/32KB (Instruction/Data)
- L2 Cache - 1MB
- RAM - 512GB LPDDR2 @ 800 MHz (128 bit interface, off package)
- Max Theoretical Memory Bandwidth - 12.8 GB/s [1]
- GPU Type - Quad Core PowerVR SGX 543 @ 250 MHz
- GPU Performance - 36 GFlops, 175 MTriangles/s [2]
A6:
[7]
- Manufacturer - Samsung on HKMG 32nm process
- Die Size - 96.71 mm2
- Designer - Apple
- CPU Type - 1.3GHz "Swift" Core
- Core Count - 2
- Instruction Set - ARMv7s
- Chip Designator - S5L8950X
- L1 Cache - 32/32KB (Instruction/Data)
- L2 Cache - 1MB
- RAM - 1GB LPDDR2 @ 1066 MHz (64 bit interface, PoP)
- Max Theoretical Memory Bandwidth - 8.5 GB/s [1]
- GPU Type - Triple Core PowerVR SGX 543 @ 325 MHz
- GPU Performance - 36 GFlops, 175 MTriangles/s [2]
[6]
- Manufacturer - Samsung on HKMG 32nm process
- Die Size - 123 mm2
- Designer - Apple
- CPU Type - 1.4GHz "Swift" Core
- Core Count - 2
- Instruction Set - ARMv7s
- Chip Designator - S5L8955X
- L1 Cache - 32/32KB (Instruction/Data)
- L2 Cache - 1MB
- RAM - 1GB LPDDR2 @ 1066 MHz (128 bit interface, off package)
- Max Theoretical Memory Bandwidth - 17 GB/s [1]
- GPU Type - Quad Core PowerVR SGX 544 @ 300 MHz
- GPU Performance - 86.4 GFlops, 210 MTriangles/s [2]
A7:
[8]
- Manufacturer - Samsung on HKMG 28nm process
- Die Size - 102 mm2
- Transistors - Approximately 1 billion
- Designer - Apple
- CPU Type - 1.3GHz "Cyclone" 64-bit Core (1.4GHz for iPad products)
- Core Count - 2
- Instruction Set - ARMv8-A (with custom Apple extensions)
- Chip Designator - S5L8960X
- L1 Cache - 64/64KB (Instruction/Data)
- L2 Cache - 1MB
- L3 Cache - 4MB
- RAM - 1GB LPDDR3 @ 1600 MHz (64 bit interface, PoP for iPhone)[13][14]
- Max Theoretical Memory Bandwidth - 12.8 GB/s
- GPU Type - "Quad Cluster" PowerVR 6430 @ 450 MHz[15]
- GPU Performance - 73.9/110.8 GFlops (FP32/FP16)
A8:
[16]
- Manufacturer - TSMC on HKMG 20nm process
- Die Size - 89 mm2
- Transistors - Approximately 2 billion
- Designer - Apple
- CPU Type - 1.4GHz "Typhoon" 64-bit Core
- Core Count - 2
- Instruction Set - ARMv8-A (with custom Apple extensions)
- Chip Designator - APL1011
- L1 Cache - 64/64KB (Instruction/Data)
- L2 Cache - 1MB
- L3 Cache - 4MB
- RAM - 1GB LPDDR3 @ 1600 MHz (64 bit interface, PoP for iPhone)[14]
- Max Theoretical Memory Bandwidth - 12.8 GB/s
- GPU Type - "Quad Cluster" PowerVR GX6450 @ 450 MHz
- GPU Performance - 73.9/147.8 GFlops (FP32/FP16)
A8X:
[17]
- Manufacturer - TSMC on HKMG 20nm process
- Die Size - 128 mm2
- Transistors - Approximately 3 billion
- Designer - Apple
- CPU Type - 1.5GHz "Typhoon" 64-bit Core
- Core Count - 3
- Instruction Set - ARMv8-A (with custom Apple extensions)
- Chip Designator - APL1012
- L1 Cache - 64/64KB (Instruction/Data)
- L2 Cache - 2MB
- L3 Cache - 4MB
- RAM - 2GB LPDDR3 @ 1600 MHz (128 bit interface, PoP for iPhone)[14]
- Max Theoretical Memory Bandwidth - 25.6 GB/s
- GPU Type - "Octo Cluster" PowerVR GX6850 @ 450 MHz
- GPU Performance - 147.8/295.6 GFlops (FP32/FP16)
Apple SoC family attributes:
A9 Prediction
A9
- Manufacturer - Samsung on 14nm 14LPE FinFET process*
- Die Size - 100-110 mm2
- Designer - Apple
- CPU Type - 1.4GHz Fourth Generation Custom Apple Core
- Core Count - 2
- Instruction Set - ARMv8-A (with custom Apple extensions)
- L1 Cache - 64/64KB
- L2 Cache - 1MB
- L3 Cache - 8MB
- RAM - 2GB LPDDR4 @ 1600 MHz (64 bit interface, PoP for iPhone)
- Max Theoretical Memory Bandwidth - 12.8 GB/s
- GPU Type - "Quad Cluster" PowerVR GT7400 @ 450 MHz
- GPU Performance - 73.9/147.8 GFlops (FP32/FP16)
- Manufacturer - Samsung on 14nm 14LPE FinFET process*
- Die Size - 120-130 mm2
- Designer - Apple
- CPU Type - 1.5GHz Fourth Generation Custom Apple Core
- Core Count - 3
- Instruction Set - ARMv8-A (with custom Apple extensions)
- L1 Cache - 64/64KB
- L2 Cache - 2MB
- L3 Cache - 8MB
- RAM - 2GB LPDDR4 @ 1600 MHz (128 bit interface)
- Max Theoretical Memory Bandwidth - 25.6 GB/s
- GPU Type - "Octo Cluster" PowerVR GT7800 @ 450 MHz
- GPU Performance - 147.8/295.6 GFlops (FP32/FP16)
The purpose of this piece is to preview and predict the features of the iPhone 6S (and related iPads). If you have not read MacRumors rumor roundup, please go do so before reading this.
A Look Back to Last Year
This will mark the third year I've done an Apple SoC and device preview. I did an A7 prediction thread two years ago, and an A8 prediction last year, both of which you can find linked in the more reading section. For the A7, the prediction proved to be almost exactly on for the GPU, just narrowly missing on operating frequency but getting the configuration correct. It also correctly predicted the foundry and process, though that was not hard with the strength of the rumors suggesting Samsung's 28nm process. It also predicted the RAM type (LPDDR3) and size (1GB), but missed on frequency (1333 vs. 1600 MHz).
The A8 prediction was very close on the CPU, but missed significantly on the GPU, as the device ended up featuring four GPU "clusters" again instead of the predicted six. A major miss was the amount of RAM predicted. Prior to the A8 and iPhone 6, at least every other generation of iPhone had seen an increase in RAM capacity. The 10% shrink in die size compared to the A7 was also quite unexpected, especially given that the die roughly doubled in total number of transistors.
The post also accurately predicted the cellular radio and transceiver, which was greatly assisted by leaked PCBs that showed the 9625M modem. The modem necessitated the presence of the companion WTR1625 and WFR1620 chips, which were also located. PCB analysis also revealed the QFE1100 envelope tracker for dynamic RF chain power adjustment. The WiFi footprint had grown relative to previous generations, and the existence of an integrated Broadcom solution for wireless AC made it clear that a solution featuring the AC standard was likely.
Display
The display was predicted to change, in line with the strong rumors of both 4.7 inch and 5.5 inch displays. No quality improvements were predicted, mostly based on the fact that no rumors or analysis of leaks led us to believe the displays would be anything but a shift in size from the 4 inch screen, which had already improved color gamut and adopted in-cell touch assemblies over the original retina display. Apple made it a point to mention specific improvements to the display in their keynote, a primary example being dual domain pixels for improved color distortion performance on off-angles. Anandtech and DisplayMates's reviews of the iPhone 6 and 6 Plus displays revealed improved contrast, color accuracy and off-angle performance. To summarize, it was the best LCD display ever featured in a phone.
The use of sapphire over gorilla glass for the display was perhaps the hottest talking point surrounding the new iPhone's development, which was already a staple of the iPhones' camera lenses. There was much debate over whether they would be included, despite the strength of the rumors regarding their development. It was eventually revealed that gorilla glass was again in use, which made sense given the ultimate resolution of Apple's bankruptcy settlement with sapphire producer GT Advanced. It seems clear that Apple intended to use sapphire in their displays, but the plans succumbed to poor management at GT Advanced.
Flash Storage
NAND storage was an interesting topic because of schematic leaks that showed 16GB, 64GB and 128GB storage options, making for an easy prediction. This matched with flash makers' announcements of NAND densities sufficient to reach 128GB modules with eight die inside a single package. The confusing part of the schematic rumors definitely centered around a 1GB flash module, which was briefly mistaken for RAM. This 1GB NAND was never located on the board or inside the application processor package, so if the leak was legitimate, it would need to be housed inside the flash memory package with the other die. As to its purpose, I suggested at the time that it could be related to secure storage of fingerprint or health data. Given that this 1GB pool was never confirmed to exist, we'll have to relegate it to rumor once again.
Battery
Thanks to increased chassis size due to larger displays, larger capacity batteries seemed a given - a desire that had been in constant war with Apple's quest for thinner devices. There were several leaks featuring potential batteries for the new iPhones that turned out to be accurate. The iPhone 6 had 1810 mAh and 2100 mAh capacities rumored, whereas the iPhone 6 Plus had 2915 mAh. The smaller size turned out to be accurate for the iPhone 6, while the only 6 Plus leak was also accurate. Apple changed battery chemistry at the time of the iPhone 5 with much of the rest of the industry. The new chemistry allowed for more efficient discharge and perhaps enhanced battery lifetime, and there hasn't been any interesting battery rumors to cover since that point outside of capacity size.
CPU
CPU architecture is an area in which Apple had made tremendous strides in successive years prior to last year's A8. A6 was Apple's first full custom design, which was followed by an industry first 64-bit A7. The year over year 2x performance improvement from each of these was made possible by ISA and process technology advancements, along with a very generous helping of engineering talent. Last year, Apple had a full node transition from 28nm to 20nm at their disposal, but my CPU prediction was very muted based on the strength of the A7 design.
Apple's A7 was a huge advancement in mobile CPU architecture. With an issue width of six micro-ops, Apple was rivaling desktop processors. Nvidia's custom 64 bit core does have an issue width of seven, but we're likely reaching the practical limits of issue width for today's workloads. Skylake, Intel's latest microarchitecture, also features a dispatch of six micro-ops per cycle. A handful of smaller optimizations to the architecture were predicted, including things like improved ALUs, an improved memory architecture, and reduced latency for complex ALU operations. All of those improvements came to light through Anandtech's benchmarking in their A8 review. Apple also seems to have moved to independent L2 cache designs for each core, as noted by Chipworks. This makes sense, as the L3 last level cache is still shared.
One focus Apple emphasized was sustained performance on A8. Although performance gains were only up to 20% in most cases, they wanted to reduce the throttling becoming more common in mobile processors as thermal limits were reached. As Anandtech's review shows, they definitely achieved this goal. The review remarked how it was the first processor to be put through the benchmark suites that did not throttle during performance tests. This is pretty significant, as mobile SoC makers had been progressively pushing for more peak performance to advertise the gains for which consumers had developed an appetite.
GPU
The GPU prediction from last year had two main components, the first of which was another architecture upgrade. The A7 saw Apple adopt Imagination Technologies' "Rogue" Series 6 architecture just a little over a year after its announcement. In early 2014, ImgTec announced the XT series addition to the Rogue family. It had features like ATSC compression support, as well as finer DVFS options, dubbed 'PowerGear'. The XT series also increased FP16 throughput by 33%, effectively making it double that of the FP32 throughput. While FP16 vs. FP32 usefulness on mobile is a debate of its own, to end result was Apple claiming up to 50% speed improvements. Most benchmarks showed modest gains, with some cases pushing towards the 50% theoretical improvement.
The second reason a six cluster GPU was predicted was the full node transition from 28nm to 20nm. With mild changes to the CPU predicted, there would be plenty of extra transistors for GPU increases if the die size stayed relatively constant. Apple had also pushed GPU performance quite hard generation after generation. It turns out that transistor growth for a given cluster count increases quite dramatically going from the G6x30 series GPUs to their GX6x50 successors. In what could theoretically be a 50% increase in transistor density thanks to a whole node shift, the GPU only shrunk about 10% from A7 to A8. It's clear that the extra ALUs, compression hardware, and cluster power control circuits have a hefty transistor overhead.
RAM
The stalemate on application processor RAM was somewhat of a surprise. Prior to the iPhone 6 and 6 Plus, the iPhone had doubled RAM at least every other generation. We did see the iPad Air 2 gain 2GB of RAM with the A8X, but that move was to help support the split-screen multi-tasking of the full size iPad. Also, while there had been rumors of Apple adopting LPDDR4 for the A8, it seemed too early in the memory's product lifecycle for sufficient volume to be available to Apple, resulting in LPDDR3 being used again.
Regarding RAM, I'd like to call back to the A7 design when Chipworks examined the package. There was a large increase in pads, which suggested that we would see an 128 bit memory interface rather than the 64 bit memory interface that had become common on Apple processors. This large padout was again seen on the A8 package, and we still have no answer for the increased number of pads, as those processors feature 64-bit memory interfaces. Perhaps Chipworks has since discovered the answer and keeps it in one of their paid reports, but it's an interesting development to keep in the back of your mind as we move forward.
Audio
Device audio is another area of small but keen interest for some. Android has recently seen some adoption of 24-bit audio to cater to audiophiles, and suggestions that Apple would require artists to submit 24-bit masters to iTunes stoked the flames for some. Anandtech covers audio performance pretty well in their A8 review, giving the audio codec praise for its harmonic distortion and noise performance. This seems to be an area where we may see some activity, as some think there are some obvious gains to be had by improving audio bit depth.
Rather than enhancing the DAC or amplifier stages of the audio chain, I also theorized that Apple could leverage their MFi program to create hardware through their recently acquired Beats brand that featured built-in DACs and/or amps to market to the audiophile segment. This possibility did turn out to be a reality, but via third party Philips.
A8X
Last year, we also saw the re-introduction of an iPad specific processor via the A8X. After the A7 remained the only application processor design for the iPhone 5S, iPad Air, and Retina iPad mini, many had assumed that Apple had now deemed GPU performance sufficient for one SoC to power all the iDevices released in a given year. A8X proved that wrong, with a GPU double the size, a third CPU core, double the L2 cache, and double the RAM. The new multitasking features in iOS served as an impetus to have a high performance tablet version of the A8 SoC. In retrospect, the design effort that the A7 represented could be the reason it existed as the sole processor product of Apple that year. Apple was devising a huge architecture shift on a very new ISA in addition to a new GPU architecture adoption, so all design resources may have been required to hit their target goals. With the more modest architecture changes on A8, they could devote resources to developing the A8X on a similar timeframe.
One last thing to mention about the A8X is what the die shot shows. We know from diagnostics that the CPU cores have 2MB cache. If you grab the L2 cache from the A8 shot, you can superimpose it and see that there clearly appear to be blocks two times the size of the A8 L2 cache. However, there's only two of them despite there being three cores. What's going on there, an asymmetric cache structure where two cores share an L2? That seems unlikely and I'm probably missing something obvious, but it's an interesting difference to point out.
Fixed Function Blocks
I have saved the most interesting part of the A8 for last, and that centers around transistor density and die size. As soon as Apple announced the transistor count had approximately doubled from the A7 to A8, I began to imagine ways they could have incurred such transistor growth. An obvious candidate would have been the aforementioned predicted GPU size increase. SRAM size increase would be another obvious one, as their six transistor bit structures are easy ways to push up transistor count while also being very dense. That didn't turn out to be true, either, with L1, L2 and L3 caches all staying the same size.
The GPU did see its transistor budget increase quite a bit, but not by a considerably large amount, while the CPU increases were very minimal. This turns us to fixed function blocks. Apple's efforts in fixed function and mixed signal blocks has been an area of interest for some time. From the keynote, we know two significant fixed function blocks that would cause transistor growth. The first was a change to the ISP that allowed for 240 fps slo-mo video, doubling the previous 120 fps slo-mo. The second was what Apple referred to as a "desktop class" image scaler. With the introduction of the iPhone 6 and 6 Plus, graphical elements were no longer necessarily neat multiples of one another across generations, so Apple allowed the iPhone to downsample graphics drawn at a higher resolution to preserve nice pixel multiples.
These fixed function additions don't tell the whole story, but they begin to explain how Apple could have such large GPU growth without significantly increasing the main sources of transistors: GPU, CPU and SRAM. What is also very interesting is that the transistor density scaling they achieved. From an absolute ideal standpoint, a given design could scale to approximately half of its size on a full node shrink. However, not all circuit types scale the same way, and transistors don't always scale the same way in channel width as they do in length. Realistically, you are then looking at probably 70% scaling. However, going from 1 billion to 2 billion transistors (and perhaps rounding is on Apple's side here) means Apple is looking at a near 0.45 scale factor. With SRAM/cache being the best bet for increased transistor density, they did not change at all in memory capacity yet Apple achieved this scaling factor. Once again, Apple has some extremely talented engineers.
What You Should Have Read This Past Year
In this section, I'd like to highlight some reading from the past year that provides context for the state of the industry and technology at this time.
Anandtech's iPhone 6 Review - Anandtech continues to put out the best iDevice reviews even after losing Shimpi and Klug to Apple. They are the reason we have many of the microarchitecture details on Apple's custom SoCs.
Anandtech's iPad Air 2 Review - In addition to the above, this gives us a look at the A8X.
Displaymate iPhone 6 and 6 Plus Display Review - A critical look at display performance on the iPhones released last year.
Enhanced Sapphire - DisplayMate gives a short technology overview of a new type of Sapphire that beats glass in reflectance tests, which means better ambient light performance. All while still being more scratch resistant than glass.
Thin Loop Heat Pipe - Fujitsu has developed a heat pipe that dramatically improves cooling performance in a form factor suitable for phones. This isn't just about being able to use more power in a given thermal envelope. Lowering heat for normal CPU operations lowers leakage current due to heat and can have other benefits.
Light-splitting camera patent - Apple patented a system for splitting light into its individual color components to enhance image quality. They adapted a technology often seen in larger video capture devices to a mobile form factor.
Tile-based rendering overview - ImgTec explains how their graphics processors work using tile-based rendering. It's a proven power efficient rendering method, which is why it has found such a good home in mobile spaces.
Apple Watch System-In-Package - A great short blog entry that covers the packaging advance that the Apple Watch represents.
Semiconductors from idea to product - A great overview of the flow of semiconductor IP from idea to real silicon. Gives a great context to understanding feasibility and time-to-market concerning rumors.
An Introduction to Semiconductor Physics, Technology, and Industry - Great companion read to the link above. Gives a broad but detailed overview of the state of semiconductors, process flows, and challenges faced by the IC industry. If you only read two links, make it these two.
Inside ARM's Cortex-A72 microarchitecture - a dive into ARM's latest high performance 64 bit reference core. This will power many of the rival smartphones Apple will face and will serve as one of the comparison points in benchmarks.
A comparison of HBM, Wide IO, and Hybrid Memory Cube - an overview and comparison of the 2.5D and 3D memory technologies that will eventually take over for LPDDR memory in mobile devices.
Avago purchase of Broadcom - Avago and Broadcom both have IP in several generations of the iPhone. The acquisition could have implications for Apple and the industry as a whole.
Teardown of the iPhone 6 Plus battery - Nice overview of today's battery technology with a cost analysis thrown in.
Comparing OpenGL ES To Metal - A benchmark comparison of the standard OpenGL ES and Apple's proprietary metal drivers.
The IP licensing model - A follow-on to the IP development process from another ImgTec engineer. It covers the relationship between device IP licenser and licensee.
Understanding Qualcomm's Snapdragon 810 - While it covers the entire Qualcomm Snapdragon solution, the sections on the RF components are good, especially since the next iPhone will use some of them.
Understanding Qualcomm's ImproveTouch - On the eve of Force Touch displays, it's important to understand the competitors' technologies and solutions.
Galaxy Note 5 Display Shootout - Samsung now makes the best reviewed smartphone displays, so they are the benchmark for Apple.
Hunting Down a Turncoat - A very well written account of the former TSMC executive who became disenchanted with his employer over a skipped promotion and eventually gave his company's technology process secrets to Samsung.
Last edited: