PDA

View Full Version : comparing level 1 cache and 2 cache mac and pc...


howard
Nov 12, 2003, 10:53 AM
so i always hear about how good level 2 cache and more is better and so on...and i've expirenced it too..using powerbooks and imacs that have almost the same insides except the powerbook as more L2 cache... well how important is level 1 cache? theres certainly a lot less of it. 32k/64k on the new G5. is it important? is there a reason there isn't more? what exactly does it do?

also just for comparison i checked on the P4's cache..it appears to have 512 like the g4s and g5s but only 8k for L1 cache!! is that a big disadvantage? I'm curious to learn more about all of these processors. and if theres any info you can point to me about the best athlon chips and and itanium and xeon i would appreciate it, info on them is hard to find other than there clock speeds... i like how apple has EVERYTHING about there chips in there tech specs page.

ddtlm
Nov 12, 2003, 11:48 AM
Cache can be exceedingly complex. This Arstechnica article is probably still pretty up to date:

http://arstechnica.com/paedia/c/caching/caching-1.html

The P4's tiny L1 cache is party in order to make it work super quickly, and partly because of the trace cache design, which is unique (instructions are decoded into some other strange format and stored that way). The size of L1 is important, but generally a well optimized processor is designed around the strengths and weaknesses of an L1 so a smaller L1 may have been selected for some other benefit.

You can find out a lot about the processors from Intel, AMD and those type of companies by going to their websites and digging around in the products pages, looking for technical specs documents (usually PDF's). You can find a lot of info there.

Powerbook G5
Nov 12, 2003, 12:37 PM
If you have a superfast processor, then relying on high amounts of cache isn't as important since the processor, in the case of the 3+ GHz P4s, is fast enough that it doesn't need to rely on heavy amounts of cache.

ksz
Nov 12, 2003, 01:18 PM
Originally posted by Powerbook G5
If you have a superfast processor, then relying on high amounts of cache isn't as important since the processor, in the case of the 3+ GHz P4s, is fast enough that it doesn't need to rely on heavy amounts of cache.
Not exactly. L1, L2, and even L3 caches are designed to reduce the effects of latency between main memory and the CPU. If memory access operated at the same speed as the CPU, there would be no need for a cache. A CPU running at 1 GHz has a clock cycle of 1 ns. If main memory access takes 50 ns, for example, the CPU has a lot of dead time on its hands. After all, a CPU performs three primary functions:

1. Fetch (memory read)
2. Execute
3. Store (memory write)

CPU caches minimize but never completely eliminate the effects of latency. However, managing the cache is nontrivial and there are tradeoffs between cost, power consumption, cache management complexity, and other factors.

Actual CPU performance is determined not only by the size of the various caches, but also on the hit ratio. Tight program loops, for example, can be fully loaded and executed from cache (high cache hit ratio). Programs that have erratic memory access characteristics will encounter a high frequency of cache miss, forcing a main memory I/O.

Today's CPUs also feature speculative execution, predictive branching, and deep pipelines to handle this stuff. In real world terms, this means that the CPU, in Step 1 above, will not only fetch the instruction it needs right now (as determined by the so-called "program counter"), but will pre-fetch a bunch of other instructions from main memory in order to not only reduce memory latency, but also to keep its pipeline filled.

This can have negative effects, too. An incorrect branch prediction will cause the processor to flush the pipeline and try again, an expensive operation.

My guess is that the deep pipeline of the Pentium 4 is partly the reason for the relatively small L1 cache on that processor.

Santiago
Nov 12, 2003, 01:28 PM
Actually, the faster the processor, the more important the cache is. The problem nowadays is that the limiting factor (the bottleneck) in computing is not how fast your processor can crunch data, but how fast you can get data in and out of the processor.

There is a tradeoff in memory between speed, size, and price. To keep price low, you can either have fast small memory or slow big memory. Or, you can have both, setting up multiple layers of caches the get increasingly fast. When you have a cache miss, then you only need to go up one level, for a small speed hit.

PowerPCs have a lot more visible registers than x86 chips, and registers are effectively Level Zero cache. The big increase in the speed of main memory on the G5s makes the amount of cache present slightly less important. The ratio of processor speed to memory speed on the G5s is between 2:1 and 2.5:1 (depending on whether you have a 1.6, 1.8, or 2.0 GHz model). This is the same as the ratio of chip to L2 cache speed as on some earlier G4s. Ultimately, what matters isn't really the number after the L, which just indicates that cache's position in the hierarchy, but the speed ratio between that cache and the main processor (or the next slowest cache).

ddtlm
Nov 12, 2003, 03:24 PM
Santiago:

The problem nowadays is that the limiting factor (the bottleneck) in computing is not how fast your processor can crunch data, but how fast you can get data in and out of the processor.
No, caches have kept the processor fed well enough that the speed of the processor is still very important.

PowerPCs have a lot more visible registers than x86 chips, and registers are effectively Level Zero cache.
However PPC code is less dense than x86 code so it uses more space, and 64-bit code uses more space than 32-bit code.

The ratio of processor speed to memory speed on the G5s is between 2:1 and 2.5:1 (depending on whether you have a 1.6, 1.8, or 2.0 GHz model). This is the same as the ratio of chip to L2 cache speed as on some earlier G4s.
Wowa there hold on, you've been confused by clockspeeds. This is another version of the megahertz myth. Just cause the FSB on a G5 goes 1ghz does not mean it is twice as fast as L3 on a G4 clocked at 500mhz. Sure it has lots of bandwidth but its also got lots of latency. You ask for something from a G5's RAM and you get it back in something like 200 ns. You ask for something in a creaky old G4's L3 and you get it in 40 ns.

Its like saying a frieght train is "fast" cause it can move as much cargo in as much time as a whole fleet of Fedex planes. If you want your bits of cargo right after you order them, you go with Fedex. If you want a lot of cargo cheap, you go with the train.

A train works good when you know what you want a month before hand. Fedex works good if you don't. Same for fast caches vs fast system RAM.

acj
Nov 14, 2003, 03:08 AM
ddtlm is so smart