Apple's A4 Processor is Based on Cortex A8, Not A9?

cmaier · Mar 2, 2010

lilo777 said:
But... The computer has to run flash to be useful

The missile nose cone targeting microprocessor I developed for DARPA ran Flash 1.0. Missiles get bored on those long flights and like to play flash hangman.

holmesf · Mar 2, 2010

diamond.g said:
So when you run out of physical ram all you can do is (hopefully gracefully) die?

I wonder if that is why there haven't been any games that really push iPhone OS devices. It appears that the (later) hardware is capable of Gears of War type graphics (purely on a technical level) but RAM shortage can be problematic.

If you run out of physical RAM in iPhone OS the OS will notify your application and you can either:
1. free up some memory or
2. die

But I don't think that's why developers aren't pushing the envelope. My guess is that its for reverse compatibility. I have a first generation iPod touch, and the thing is slow as sin compared with the most recent generation and doesn't support OpenGL ES 2.0, so no games based around shaders. I think there's too many people like me for developers to justify the extra development effort for higher power games.

In terms of RAM usage: developers can always manually load and unload resources to stay within RAM limits, and usually they can make smarter decisions about which resources to load/unload when compared with the OS VM pager. I also don't know of any game consoles (correct me if I'm wrong) which have disk backed virtual memory, and developers seem to be able to work with them just fine (though they tend to complain about RAM squeeze to be sure).

Back in the early days adding a VM pager was considered a (somewhat) acceptable way to increase effective memory capacity without getting more RAM, but since the gap between hard disk and main memory has increased since then, the purpose of the pager is more a last-ditch feature for stability: if the working set of your workload uses more physical memory than you are pretty hosed performance wise.

840quadra · Mar 2, 2010

kernkraft said:
I'm sorry, but who are you again? (Not my words...)

haha ok. I am both honored and flattered to be both copied and quoted by such a well respected poster.

holmesf said:
<snip>

Back in the early days adding a VM pager was considered a (somewhat) acceptable way to increase effective memory capacity without getting more RAM, but since the gap between hard disk and main memory has increased since then, the purpose of the pager is more a last-ditch feature for stability: if the working set of your workload uses more physical memory than you are pretty hosed performance wise.

I assume you are discussing the Access Speed, not the size gap?

I am curious to see how this does or doesn't change on desktop systems in the future as SSD becomes more popular / standard. While it still is slower to access than physical memory, it is still much quicker than rotating storage.

Jaro65 · Mar 2, 2010

Chundles said:
Looked plenty fast enough in the videos, not like I'm doing Handbrake rips on it.

Interesting tid bit but is the processor really important in a product like this?

When I look at the performance difference between my original iPhone and an iPhone 3G (don't have a 3GS phone) that I would certainly say "yes".

holmesf · Mar 2, 2010

840quadra said:
haha ok. I am both honored and flattered to be both copied and quoted by such a well respected poster.

I assume you are discussing the Access Speed, not the size gap?

I am curious to see how this does or doesn't change on desktop systems in the future as SSD becomes more popular / standard. While it still is slower to access than physical memory, it is still much quicker than rotating storage.

Definitely access speed. I believe that actually hard disk capacity has increased at an exponential rate exceeding both CPU speed and RAM speed. Hard disk latency, on the other hand, has only decreased by a few times since the early 1990s.

I'm curious to see how SSDs affect things too. I've run benchmarks that suggest that disk swapping performance on Mac OS X is limited by disk bandwidth, not latency, and so SSDs biggest advantage (the latency) wouldn't help there, but the increased bandwidth would.

On the other hand, maybe you don't want to use SSDs to back physical memory since it could affect the longevity of the drive negatively? I really don't know anything about SSDs though.

AidenShaw · Mar 2, 2010

gnasher729 said:
Now assume that when whichIndexArray contains FALSE, indexArray1 contains large random numbers. Then the processor will try to read a random location.

To the naïve programmer (rather than the CPU designer), it would
seem to be obvious that a speculatively-executed instruction
could not be permitted to cause an exception of any kind. Anything
else would violate the semantics of the ISA.

Either any potential exception would be delivered later when the
code path hit that instruction, or speculative exceptions
simply would not occur. (That is, if any speculatively-executed
instruction would cause an exception - that speculative
instruction is unwound and never happened.)

cmaier · Mar 2, 2010

AidenShaw said:
To the naïve programmer (rather than the CPU designer), it would
seem to be obvious that a speculatively-executed instruction
could not be permitted to cause an exception of any kind. Anything
else would violate the semantics of the ISA.

Either any potential exception would be delivered later when the
code path hit that instruction, or speculative exceptions
simply would not occur. (That is, if any speculatively-executed
instruction would cause an exception - that speculative
instruction is unwound and never happened.)

Really? Why? If I speculatively executing a floating point instruction, all progress has to stop if it generates a divide-by-zero? If the speculative instruction is a load/store, I can't keep going if there is a cache miss? If a conditional branch is followed by another conditional branch, I have to stall until I determine if the first conditional branch was correctly predicted?

In reality, in most out-of-order-retire microarchitectures, these things are permitted to occur. If it turns out that the code branch should not have been taken, things are unwound and you pay any applicable time penalty. You just have to make your branch prediction correct often enough that the benefit of guessing right most of the time more than makes up for the penalty you pay when you guess wrong.

Of course, when you take power consumption into account, the whole calculus changes.

deconstruct60 · Mar 2, 2010

holmesf said:
In terms of RAM usage: developers can always manually load and unload resources to stay within RAM limits, and usually they can make smarter decisions about which resources to load/unload when compared with the OS VM pager. .

Only because there is one significant app running. If have multiple applications sharing scarce resources then the central allocator (OS VM pager) will have more knowledge than the app. Or at least have a more comprehensive notion of what is fair.

Back in the early days adding a VM pager was considered a (somewhat) acceptable way to increase effective memory capacity without getting more RAM, but since the gap between hard disk and main memory has increased since then,

Flash is relatively expensive and "smaller" than hard disks. Similarly, main memory of handhelds is also smaller than what commonly appears on more mainstream PCs. However, responsiveness is a key quality in many embedded systems. That's why do away with VM and restrict folks to a max set of resources.

The reason got past VM on most desktops is that the vast majority of people don't have workloads that exceed a VM given have a current "middle of the road" machine.

AidenShaw · Mar 2, 2010

cmaier said:
Really? Why? If I speculatively executing a floating point instruction, all progress has to stop if it generates a divide-by-zero? If the speculative instruction is a load/store, I can't keep going if there is a cache miss? If a conditional branch is followed by another conditional branch, I have to stall until I determine if the first conditional branch was correctly predicted?

In reality, in most out-of-order-retire microarchitectures, these things are permitted to occur. If it turns out that the code branch should not have been taken, things are unwound and you pay any applicable time penalty. You just have to make your branch prediction correct often enough that the benefit of guessing right most of the time more than makes up for the penalty you pay when you guess wrong.

Of course, when you take power consumption into account, the whole calculus changes.

I think that we are in agreement here - as you say "things are
unwound" and the bad thing never happened.

One point though - to me "exception" is an ISA-defined method
of notifying the code stream that something did not work as
expected.

You don't have exceptions for cache misses, since the cache is
not defined in the ISA....

By the way - how can you speculatively execute a store instruction? Do you put the store into the write queue and
let it sit until the store is committed?

deconstruct60 · Mar 2, 2010

AidenShaw said:
To the naïve programmer (rather than the CPU designer), it would
seem to be obvious that a speculatively-executed instruction
could not be permitted to cause an exception of any kind. Anything
else would violate the semantics of the ISA.

The speculative result can be "retired" in order. So yes, in order to not violate semantics ( or at the least to make compatible semantics across implementations ) in order retirement isn't pragmatically optional.

So you'll need a place to store all of this speculated results until the island-hopped statements catch up. Storing all of that intermediate stuff costs power and space. 5 instructions in flight will cost less than 10 in flight but more than 2. There is a relatively small cap on how much instruction level parallelism that is present in code. So don't have to speculate very far forward until hit a wall ( current statement dependent upon result from previous, undone calculation. )

You have similar issue with Symetric Multi Threading (SMT). Thread 0 hits a timeout interrupt or hit a page break. You have to push both sets of threads onto interrupt stack or just one? There is also more "state" to save/restore.

Either any potential exception would be delivered later when the
code path hit that instruction, or speculative exceptions
simply would not occur.

some of the exceptions can possibly sit on until the proper time.
Others you may not, like accessing a new page.

cmaier · Mar 2, 2010

AidenShaw said:
I think that we are in agreement here - as you say "things are
unwound" and the bad thing never happened.

One point though - to me "exception" is an ISA-defined method
of notifying the code stream that something did not work as
expected.

You don't have exceptions for cache misses, since the cache is
not defined in the ISA....

By the way - how can you speculatively execute a store instruction? Do you put the store into the write queue and
let it sit until the store is committed?

If the cache row is clean you can store into the cache row and, if you speculated incorrectly, re-load the correct data into the cache from main memory. If the cache row is dirty you can't do that, of course. And that doesn't work if you have a write-through cache, which is often the case in multiprocessing situations. If you add buffering, as you suggest, then you can speculate a certain number of levels deep, but, again, if you have multiprocessing this is problematic because you can have cross-processor dependencies to unwind.

The cache may be defined in the ISA, though it doesn't necessary generate asynchronous exceptions of the sort you seem to be talking about - among other things, that's how software can report cache hit/miss statistics.

deconstruct60 · Mar 2, 2010

AidenShaw said:
By the way - how can you speculatively execute a store instruction? Do you put the store into the write queue and
let it sit until the store is committed?

Store where? Into a renamed register... sure. As long as nothing outside of speculative stream depends on it can just throw the contents away.

Into memory? no.

cmaier · Mar 2, 2010

deconstruct60 said:
Store where? Into a renamed register... sure. As long as nothing outside of speculative stream depends on it can just throw the contents away.

Into memory? no.

It's not uncommon to have a dedicated store queue that holds speculative stores (for use in shortcircuiting subsequent loads). In the old days we used to write speculative stores into clean cache lines (and mark them dirty and speculative) as I described above.

AidenShaw · Mar 2, 2010

deconstruct60 said:
Store where? Into a renamed register... sure. As long as nothing outside of speculative stream depends on it can just throw the contents away.

Into memory? no.

But load/store architectures don't store into registers -
load/store are memory operations.

cmaier said:
It's not uncommon to have a dedicated store queue that holds speculative stores (for use in shortcircuiting subsequent loads). In the old days we used to write speculative stores into clean cache lines (and mark them dirty and speculative) as I described above.

Thanks, that makes sense to me.

deconstruct60 · Mar 2, 2010

cmaier said:
It's not uncommon to have a dedicated store queue that holds speculative stores (for use in shortcircuiting subsequent loads). In the old days we used to write speculative stores into clean cache lines (and mark them dirty and speculative) as I described above.

The first isn't memory. It is the mechanism to hold the intermediary result. A hodge-podge between special registers and memory. The latter doesn't really work, as you also pointed out above. Once can bring multiple contexts into execution it breaks down ( sharing something that is only one execution context specific). If can't run Unix ( or any other multiprocessing/multitasking OS, let alone go multicore implementation) that really isn't a real product, IMHO. A nice hack perhaps, but not something going to release.

cmaier · Mar 2, 2010

deconstruct60 said:
The first isn't memory. A hodge-podge between special registers and memory. The latter doesn't really work, as you also pointed out above. Once can bring multiple contexts into execution it breaks down ( sharing something that is only one execution context specific). If can't run Unix ( or any other multiprocessing OS, let alone go multicore implementation) that really isn't a real product, IMHO. A nice hack perhaps, but not something going to release.

It can run UNIX. By multiprocessing I mean multiple processors, not multiple processes. It works fine with multiple processes, just not multiple processors that require a coherent cache representation. It's already gone to release. I've worked on processors that have done this.

Even in a multiple-processor case it works if you are willing to get sufficiently ugly. There are non-coherent cache implementations. So processor A speculative writes into a clean cache line, marks it as dirty and speculative. If the cache is not shared by the other processors, processor B has its corresponding cache line (if it has one) marked as speculatively out-of-date. (If they share a cache, it's moot). Thus there is no stall unless and until processor B has to read or write that cache line. This may not happen before the speculation is resolved, so no need to pre-emptively stall processor B. And, of course, processor B might not even be caching the speculative address, in which case there's no reason at all to stall.

AidenShaw · Mar 2, 2010

cmaier said:
This may not happen before the speculation is resolved, so no need to pre-emptively stall processor B. And, of course, processor B might not even be caching the speculative address, in which case there's no reason at all to stall.

Reading this thread, I realized that it's been ages since there's
been a CISC vs RISC debate here.

Maybe people do learn, after all.

cmaier · Mar 2, 2010

AidenShaw said:
Reading this thread, I realized that it's been ages since there's
been a CISC vs RISC debate here.

Maybe people do learn, after all.

I always laughed at those debates. My ph.d. involved a RISC processor, and I designed PowerPCs and SPARCs as part of my employment. I also designed x86 processors as part of my employment.

Under the hood, not a heck of a lot of difference, and compilers love x86.

JeffDM · Mar 2, 2010

cmaier said:
It can run UNIX.

Is this still about ARM? If so, I don't know why that was even a question. Corel had a machine called the Netwinder, a DEC StrongARM running Linux. DEC's StrongARM evaluation board ran a flavor of BSD. This was 10-15 years ago.

cmaier · Mar 2, 2010

JeffDM said:
Is this still about ARM? If so, I don't know why that was even a question. Corel had a machine called the Netwinder, a DEC StrongARM running Linux. DEC's StrongARM evaluation board ran a flavor of BSD. This was 10-15 years ago.

No, it's about a particular form of speculative store instructions.

AidenShaw · Mar 2, 2010

cmaier said:
Under the hood, not a heck of a lot of difference, and compilers love x86.

The VAX 9000 was the beginning of the end of the CISC vs RISC
debate, and P6 sealed it.

It took about 10 more years, though, for the arguments among
the "amateurs" to end.

840quadra · Mar 3, 2010

holmesf said:
Definitely access speed. I believe that actually hard disk capacity has increased at an exponential rate exceeding both CPU speed and RAM speed. Hard disk latency, on the other hand, has only decreased by a few times since the early 1990s.

I'm curious to see how SSDs affect things too. I've run benchmarks that suggest that disk swapping performance on Mac OS X is limited by disk bandwidth, not latency, and so SSDs biggest advantage (the latency) wouldn't help there, but the increased bandwidth would.

On the other hand, maybe you don't want to use SSDs to back physical memory since it could affect the longevity of the drive negatively? I really don't know anything about SSDs though.

That's a great point.

I know that older technology of flash memory had limits on how many writes could occur before you reach the limits of the chips. I have also read articles proclaiming 5 - 30 years of continuous writes, depending on the technology, and components used within a modern SSD.

Truth is, I am like yourself, not 100% up to speed on this technology, perhaps I should hit the books again too

.

diamond.g · Mar 3, 2010

840quadra said:
That's a great point.

I know that older technology of flash memory had limits on how many writes could occur before you reach the limits of the chips. I have also read articles proclaiming 5 - 30 years of continuous writes, depending on the technology, and components used within a modern SSD.

Truth is, I am like yourself, not 100% up to speed on this technology, perhaps I should hit the books again too .

In the case of the iPad does it matter what technology is in use? Is Apple using the same flash memory that the iPhone/iPod (nano shuffle and touch) use? Which would be closer to what is used in SD cards not what is used in SSD's or am I totally off base?

840quadra · Mar 3, 2010

diamond.g said:
In the case of the iPad does it matter what technology is in use? Is Apple using the same flash memory that the iPhone/iPod (nano shuffle and touch) use? Which would be closer to what is used in SD cards not what is used in SSD's or am I totally off base?

Yes we have gone a little off topic. We were talking about Paging, and how the iPhone OS really can't, doesn't do that. We then migrated off to talking about "what if you paged to an SSD", and you can see how that ended.

Apple doesn't list detailed tech specs, but the industry believes that the iPad uses NAND flash, which is the same (or similar depending on exactly what chips are used) as is in the iPhone and iPod Touch.

Becordial · Mar 3, 2010

A few people have been saying that the iPad will still need a computer to run.

I hope it's not the case though except for optional file transfers.

If you can now register to create an itunes account on the iPhone and therefore iPad and assume for a moment that all content you want will come Safari/ iTunes and all music and video can all be streamed at will is there really anything on it that it needs to be connected to a computer for?

I can't see why it wont work straight out of the box and never need to be connected at all.

Apple's A4 Processor is Based on Cortex A8, Not A9?

Suspended

macrumors 6502a

Moderator

macrumors 68040

macrumors 6502a

macrumors P6

Suspended

macrumors G5

macrumors P6

macrumors G5

Suspended

macrumors G5

Suspended

macrumors P6

macrumors G5

Suspended

macrumors P6

Suspended

macrumors 6502a

Suspended

macrumors P6

Moderator

macrumors G5

Moderator

macrumors 6502

Our Staff