AW's S-series Specs & Benchmark

TigeRick · Sep 29, 2023

Apple has finally upgraded the SoC inside Ultra 2 to S9 SiP. We still don't know much about this SoC, thus I am creating this thread for keeping technical information and speed about current and future S-series from Apple. I still not sure about the amount of RAM inside S8 and S9, I remember seeing some tweets said about 1.5GB RAM, anyone can confirm?

First benchmark appeared from Speedometer 2.1, so far it shows +38% performance improvement on web applications. Not bad, the speed almost approaching iPhone 6s with A9 SoC. Is there any other benchmarking app??? If you guys can show me, I will update the table accordingly, thanks

FYI, Samsung's Galaxy Watch5 with 2xA55 @ 1.18GHz scored 7 runs/min 😂

The latest SoC from Qualcomm, Snapdragon W5+ with 4xA53 @ 1.7GHz scored 11 runs/min. Better than Samsung, this SoC rumored to be used for upcoming Pixel Watch 2

	Ultra	Ultra 2	Ultra 3?
Year Announced	2022	2023	2024
SoC	S8 1.8GHz Dual Core	S9 ? Dual Core
Process Node	N7P	N5
Transistors	3.5 billion	5.6 billion
Architecture	A11's E-core	A15's E-core
GPU	1 core - 64 ALU?	1 core - 256 ALU?
RAM	1.5 GB LPDDR4X	? GB LPDDR4X
Neural Engine	NA	4-core
Storage	32GB	64GB

Speedometer 2.1 (Runs/min)	27.0	37.1

srknpower · Sep 29, 2023

I remember Series 7 has a ram of 1.5 GB.

OrenLindsey · Sep 29, 2023

What is the ram on ultra 2? That's what I want to know

jz0309 · Sep 29, 2023

TigeRick said:
Apple has finally upgraded the SoC inside Ultra 2 to S9 SiP. We still don't know much about this SoC, thus I am creating this thread for keeping technical information and speed about current and future S-series from Apple. I still not sure about the amount of RAM inside S8 and S9, I remember seeing some tweets said about 1.5GB RAM, anyone can confirm?

First benchmark appeared from Speedometer 2.1, so far it shows +38% performance improvement on web applications. Not bad, the speed almost approaching iPhone 6s with A9 SoC. Is there any other benchmarking app??? If you guys can show me, I will update the table accordingly, thanks

FYI, Samsung's Galaxy Watch5 with 2xA55 @ 1.18GHz scored 7 runs/min 😂

The latest SoC from Qualcomm, Snapdragon W5+ with 4xA53 @ 1.7GHz scored 11 runs/min. Better than Samsung, this SoC rumored to be used for upcoming Pixel Watch 2

Ultra Ultra 2 Ultra 3?
Year Announced 2022 2023 2024
SoC S8 1.8GHz Dual Core S9 ? Dual Core
Process Node N7P N5
Architecture A11's E-core A15's E-core
GPU 1 core - 64 ALU? 1 core - 256 ALU?
RAM 1.5 GB LPDDR4X ? GB LPDDR4X
Neural Engine NA 4-core
Storage 32GB 64GB
Speedometer 2.1
(Runs/min) 27.0 37.1

View attachment 2284432

How are these benchmarks taken? I searched the AppStore for “speedometer” but there does not appear an iOS, leave alone watch app to exist?
How can you measure this without an app?
Can you please provide some background info?

TigeRick · Sep 29, 2023

jz0309 said:
How are these benchmarks taken? I searched the AppStore for “speedometer” but there does not appear an iOS, leave alone watch app to exist?
How can you measure this without an app?
Can you please provide some background info?

Speedometer 2.1

Just use the browser to click the link, no apps required

TigeRick · Sep 29, 2023

srknpower said:
I remember Series 7 has a ram of 1.5 GB.

Yeah, update with 1.5GB RAM

jz0309 · Sep 29, 2023

TigeRick said:
Yeah, update with 1.5GB RAM

And I do that on the watch how?

TigeRick · Sep 29, 2023

jz0309 said:
And I do that on the watch how?

What? Here is the URL to type: https://browserbench.org/Speedometer2.1/

altaic · Sep 29, 2023

jz0309 said:
And I do that on the watch how?

How To Use Apple Watch's Hidden Web Browser: Surf The Web From Your Wrist

A hidden web browser on your wrist.

screenrant.com

name99 · Sep 29, 2023

How/where do you get the GPU numbers?
I agree that one core is the obvious value, and the GPUs (before A17) were, like recent nVidia designs, basically
- a single core shares L1 (I, D, and scratchpad)
- is divided into four quadrants
- each quadrant has 32 lanes (and an I-L0 cache, though Apple's one is rather different from nVidia's)

Meaning that for the most obvious (and correctly crafted...) sort of benchmarks (something like how many FMA's the core can do per cycle) you should see a core as looking like it can do 128 FMA's per cycle.

BUT there are multiple ways things can become more interesting. I won't go into all of these except to point out that for Apple (nVidia does things very differently, but also very interesting) in the past (starting with M1) they have, in one lane, had a 32bit FMA unit and a 16bit FMA unit. This uses up more area, but saves power if, a lot of the time, you are doing genuine FP16 work and don't have to do that on FP32 hardware.
Now the obvious question is - if you have this duplicated hardware, can you get "double" use out of it? There are at least two possibilities.

One is that you add some (not much required) hardware to the FP32 execution unit so that it can operate on a SIMD2 pair of FP16's. This requires that you occasionally have SIMD2 work of that sort, but that's actually pretty common in graphics. If you do this, it will now (for some benchmarks) look like you have 256 rather than 128 FP units in a core.

Another alternative is if you can effectively provide two instructions per cycle in each quadrant AND you decide to (at least for situations) also execute FP16 instructions in the FP32 unit. Then a pair of instructions could execute two FP16's one in the FP16 and one in the FP32 unit in parallel.
There are multiple ways to provide two instructions per cycle, the trick is doing so at an acceptably low energy level. You could do superscalar statically scheduled code, or a form of VLIW, or even play timing games (nVidia's favorite way of handling this) so that, one way or another, the instruction pipeline operates at twice the frequency of the execution pipeline.

Point is, it's very interesting if some benchmarks are showing supposed 256 ALUs on a core, and then we would like to know many details! Isn't this the same as the A15/M2 GPU core (as far as I know, very definitely 128 ALU units, and no weird tricks like I have described)? Or did they slide in the A17 GPU (seems a strange decision, but who knows?) What EXACTLY is the benchmark that's suggesting 256 ALUs measuring?

TigeRick · Sep 29, 2023

name99 said:
How/where do you get the GPU numbers?
I agree that one core is the obvious value, and the GPUs (before A17) were, like recent nVidia designs, basically
- a single core shares L1 (I, D, and scratchpad)
- is divided into four quadrants
- each quadrant has 32 lanes (and an I-L0 cache, though Apple's one is rather different from nVidia's)

Meaning that for the most obvious (and correctly crafted...) sort of benchmarks (something like how many FMA's the core can do per cycle) you should see a core as looking like it can do 128 FMA's per cycle.

BUT there are multiple ways things can become more interesting. I won't go into all of these except to point out that for Apple (nVidia does things very differently, but also very interesting) in the past (starting with M1) they have, in one lane, had a 32bit FMA unit and a 16bit FMA unit. This uses up more area, but saves power if, a lot of the time, you are doing genuine FP16 work and don't have to do that on FP32 hardware.
Now the obvious question is - if you have this duplicated hardware, can you get "double" use out of it? There are at least two possibilities.

One is that you add some (not much required) hardware to the FP32 execution unit so that it can operate on a SIMD2 pair of FP16's. This requires that you occasionally have SIMD2 work of that sort, but that's actually pretty common in graphics. If you do this, it will now (for some benchmarks) look like you have 256 rather than 128 FP units in a core.

Another alternative is if you can effectively provide two instructions per cycle in each quadrant AND you decide to (at least for situations) also execute FP16 instructions in the FP32 unit. Then a pair of instructions could execute two FP16's one in the FP16 and one in the FP32 unit in parallel.
There are multiple ways to provide two instructions per cycle, the trick is doing so at an acceptably low energy level. You could do superscalar statically scheduled code, or a form of VLIW, or even play timing games (nVidia's favorite way of handling this) so that, one way or another, the instruction pipeline operates at twice the frequency of the execution pipeline.

Point is, it's very interesting if some benchmarks are showing supposed 256 ALUs on a core, and then we would like to know many details! Isn't this the same as the A15/M2 GPU core (as far as I know, very definitely 128 ALU units, and no weird tricks like I have described)? Or did they slide in the A17 GPU (seems a strange decision, but who knows?) What EXACTLY is the benchmark that's suggesting 256 ALUs measuring?

Well, it is in my Excel database with past collection.🤣

I cannot recall where I get the information, but you could search for A15's GPU; seems like A15 is the only A-series with such a high ALU (1280) with 600MHz speed net 1.5TFLOPS. And that's the reasons why A15's iPhones have the longest battery life due to low speed GPU.

Maybe @leman have some insights???

altaic · Sep 29, 2023

TigeRick said:
Apple has finally upgraded the SoC inside Ultra 2 to S9 SiP. We still don't know much about this SoC, thus I am creating this thread for keeping technical information and speed about current and future S-series from Apple. I still not sure about the amount of RAM inside S8 and S9, I remember seeing some tweets said about 1.5GB RAM, anyone can confirm?

First benchmark appeared from Speedometer 2.1, so far it shows +38% performance improvement on web applications. Not bad, the speed almost approaching iPhone 6s with A9 SoC. Is there any other benchmarking app??? If you guys can show me, I will update the table accordingly, thanks

FYI, Samsung's Galaxy Watch5 with 2xA55 @ 1.18GHz scored 7 runs/min 😂

The latest SoC from Qualcomm, Snapdragon W5+ with 4xA53 @ 1.7GHz scored 11 runs/min. Better than Samsung, this SoC rumored to be used for upcoming Pixel Watch 2

Ultra Ultra 2 Ultra 3?
Year Announced 2022 2023 2024
SoC S8 1.8GHz Dual Core S9 ? Dual Core
Process Node N7P N5
Architecture A11's E-core A15's E-core
GPU 1 core - 64 ALU? 1 core - 256 ALU?
RAM 1.5 GB LPDDR4X ? GB LPDDR4X
Neural Engine NA 4-core
Storage 32GB 64GB
Speedometer 2.1
(Runs/min) 27.0 37.1

View attachment 2284432

Not that it’s definitive, but the diagram Apple shows in their Sept. 12 event presentation (at 9:20) appears to have 4 GPU cores.

name99 · Sep 29, 2023

altaic said:
Not that it’s definitive, but the diagram Apple shows in their Sept. 12 event presentation (at 9:20) appears to have 4 GPU cores.

Of course who knows if the four rectangles in that presentation just correspond to four quadrants of a single core? It's not an outrageous level of artistic license. I mean 4 cores is basically an iPhone-level GPU, which seems insane!

name99 · Sep 29, 2023

TigeRick said:
Well, it is in my Excel database with past collection.🤣

I cannot recall where I get the information, but you could search for A15's GPU; seems like A15 is the only A-series with such a high ALU (1280) with 600MHz speed net 1.5TFLOPS. And that's the reasons why A15's iPhones have the longest battery life due to low speed GPU.

Maybe @leman have some insights???

Of course that assumes the 600MHz is trustworthy...

I tend to be EXTREMELY dubious of microbenchmarks run by people who have no idea how to interpret the results, or what is being tested. It's hard to screw up the meaning of a GB6 score, but it's VERY easy to take some code that you don't understand that's SUPPOSED to measure a specific feature of a core and have it fail because that feature now operates in a way whoever wrote the original microbenchmark did not expect. That's why so much of my first two PDFs is about drawing graphs, looking for anomalies, and trying to understand how results do (or don't) fit together.

altaic · Sep 29, 2023

name99 said:
Of course who knows if the four rectangles in that presentation just correspond to four quadrants of a single core? It's not an outrageous level of artistic license. I mean 4 cores is basically an iPhone-level GPU, which seems insane!

Oh, I agree. I was a bit shocked and perplexed when I watched the event live. It’s also odd that they claimed it is only 30% faster if there are 4x the cores. OTOH, it could use much less power if designed to be clocked much lower while also being able to switch cores off. Either way, 256 ALUs is pretty remarkable no matter how it’s configured.

I forget if they mentioned the lithography node— do we know if it’s produced using N3B? I would assume that it is, since it’s the first new watch SiP since S6.

srknpower · Sep 29, 2023

By the way, Ultra’s S8 chip is based on A13. S4 and S5 are based on A12.

leman · Sep 29, 2023

TigeRick said:
I cannot recall where I get the information, but you could search for A15's GPU; seems like A15 is the only A-series with such a high ALU (1280) with 600MHz speed net 1.5TFLOPS. And that's the reasons why A15's iPhones have the longest battery life due to low speed GPU.

Maybe @leman have some insights???

An Apple GPU core has 128 “compute units” (4x32-wide data-parallel SIMD processors). A15 GPU runs at 1.33Ghz if I remember correctly.

Search

Search

AW's S-series Specs & Benchmark

TigeRick

macrumors regular

srknpower

macrumors 6502

OrenLindsey

macrumors 6502

jz0309

Contributor

TigeRick

macrumors regular

TigeRick

macrumors regular

jz0309

Contributor

TigeRick

macrumors regular

altaic

macrumors 6502a

How To Use Apple Watch's Hidden Web Browser: Surf The Web From Your Wrist

name99

macrumors 68030

TigeRick

macrumors regular

altaic

macrumors 6502a

name99

macrumors 68030

name99

macrumors 68030

altaic

macrumors 6502a

srknpower

macrumors 6502

leman

macrumors Core

Our Staff