lecture on cache memory
The whole idea of cache memory is to increase expense slightly but decrease average memory access time significantly. Say it takes 10 clocks cycles for the cpu to access something in RAM, 6 cycles to access Level 2 cache, 3 cycles to access Level 1 cache, and one cycle to access internal registers. Now, if you can say what percentage of memory accesses will be satisfied by each of these levels of storage, you can calculate an average access time. Of course, RAM must be loaded from disk, but once the program is launched, this has probably been done, unless you're using virtual memory, in which case the average access time will be increased by percentage of accesses that must be loaded from disk. Cache, too, must be loaded, and since it's smaller than RAM (L1 will be smaller than L2, as well), its contents will likely change frequently in the course of program execution.
Complicated strategies have been invented to manage the use of cache, but by and large, one can say that it all works on the principle(s) of locality. When a memory access cannot be satisfied by cache, it comes from RAM, but some number of other nearby items will be fetched from RAM into cache along with the needed one, on the assumption that they'll be needed soon. So cache miss costs can be amortized over several accesses that will "hit" because their targets were loaded along with the "missed" item.
Generally, there are only broad ways to answer a question like, "how much will this much cache gain me over a smaller amount on a faster processor?" It depends on the applications you run. Some apps - say, graphics or games - are engineered to run more efficient special purpose instructions than, say, a word processor might. This can gain you significantly over and above concerns about cache size.
The best way to answer such questions is to measure performance of the desired apps on particular machines. That said, more and faster is usually better.
Tom