8GB RAM in M3 MacBook Pro Proves the Bottleneck in Real-World Tests

Do any of the defenders of soldered RAM here have any evidence of statistically significant performance gains they'd like to share with the group?
Do we even know the timings of the LPDDR5 RAM in Apple Silicon computers? Apple uses 6400MT/s RAM in the M3 generation. That speed is possible with memory modules, too. And everything else, like CAS-latency depends on the chips.
 
The entire point of Unified Memory vs standard RAM is that UMA removes bottlenecks on how much data can be shared from the CPU or GPU
Let's get another thing out of the way, UMA is, as you correctly point out, about faster data sharing between CPU and GPU. It has nothing to do with how the RAM is physically attached. Those are two independent questions. They are on orthogonal axis of the design space.
 
I'm sorry, did you read the article? Did you read the quotes that I copied?

The article you had linked says that "UNIFIED MEMORY IS RAM AND A HARD DRIVE IN A SINGLE MEMORY POOL" and that "unified memory [...] is non-volatile and will store its data without supplied power."

How is that kind of "unified memory" remotely relevant to what Apple calls Unified Memory?
That was in a separate section, it also covered Apple’s particular implementation of UMA, as well as covering Nividia’s use. It’s a broad article on UMA, but it does particularly talk about Apple’s UMA, and I linked to another article as well, which agrees on the basics of Apple’s UMA implementation. And yes I did read both articles, and clearly saw that it had separate sections where it talked about different implementations of the concept. I didn’t say that everything in the article was relevant to Apple’s implementation of UMA. Someone asked what the difference between UMA and normal RAM was, so I provided them with an article that included different implementations of UMA and how they differ from standard RAM. It wasn’t even meant as an argument for or against UMA, it was meant to be helpful for someone who was asking a question.
 
PCs can run RAM in modules at 6400MT/s and faster speeds.

That's completely irrelevant.
How is that completely irrelevant? You have this pattern of just calling things irrelevant, but someone literally asked “why does proximity matter?” And so I answered that closer proximity decreases latency. That’s the reason generally cited for close proximity.
 
That was in a separate section, it also covered Apple’s particular implementation of UMA, as well as covering Nividia’s use.
Well, one "Key Point" right at the beginning of the article is "Unified memory is a modern memory technology that combines RAM and a hard drive into a single memory pool. This newer approach, favored by Apple Inc., is thought to improve computing efficiency and energy consumption, usually coming at a premium." That's a direct quote. You are telling me that this is a good description of Apple's UMA?
 
And so I answered that closer proximity decreases latency.
If both memory on a module and memory soldered to the SoC can run at the same frequency that is a clear indication for the decreased latency not being relevant. If it were relevant then the soldered RAM could run at a higher frequency. As it stands, you can get PCs with memory modules that run the RAM at higher frequencies than any Apple Silicon chip. Perhaps it's possible to run soldered RAM faster, but that observation is, at this point, irrelevant, because it isn't running faster.
 
Let's get another thing out of the way, UMA is, as you correctly point out, about faster data sharing between CPU and GPU. It has nothing to do with how the RAM is physically attached. Those are two independent questions. They are on orthogonal axis of the design space.
The RAM being part of the M chips reduces latency, hence allowing faster data sharing between CPU and GPU.
 
The RAM being part of the M chips reduces latency
We've been over this, it runs at the same frequency that memory modules easily achieve, too. Hence the minutely reduced latency does not make an appearance in the actual performance of the soldered memory of an Apple Silicon computer.
 
PCs can run RAM in modules at 6400MT/s and faster speeds.

That's completely irrelevant.
How much power does it take to increase the distance and up the speed? And can you point us to an Intel/AMD that operates its memory at those speeds while on battery?

Or is that not relevant either?
 
That’s a new tactic, disagree with an article? Just say it can’t be right and must be written by AI.
Probably true in the case of that particular article. Other articles which may support your point (assuming that you don't think Apple's Unified RAM keeps its content when powered off) are available - that link was just a bad choice (although there's a real dearth of good links).

Examples of questionable/mangled statements in that article (none of which are really being disputed in this thread):
  • Random access memory is only capable of storing memory as long as power is supplied, unified memory on the other hand is capable of storing it in the absence of a power supply.
  • Unified memory is a modern memory technology that combines RAM and a hard drive into a single memory pool.
  • The CPU in a unified memory system accesses pooled memory resources. It has a much larger amount of memory it can use to perform operations more efficiently. As the hard-disk drive memory is integrated, the acquisition and use of data become intuitive and efficient.
From https://history-computer.com/unified-memory-vs-ram/

In other words, the linked article is a mish-mash of factoids that confuse Unified Memory/RAM, flash memory, Hybrid Storage and Hierarchical/Tiered storage - which would be entirely consistent with a LLM generating text based on an ambiguous brief like "unified memory". Unfortunately, this is the new reality - people really are using AI to bulk out websites and you have to check every article like a hawk to see that its not superficially plausible AI-generated gibberish.

As for what you are actually saying: just to be clear, the actual RAM in UMA is not built into the CPU die itself, it is in the form of conventional LPDDR memory modules surface-mount soldered to the CPU package - you can see that from the link below which shows the RAM can be upgraded (see below) for a given value of "can" (its not really a practical prospect)


...the main point of UMA is that both the CPU and the integrated GPU share the RAM & that this is implemented differently and more efficiently than older systems with integrated graphics that allocated a chunk of system RAM as video RAM, so the CPU doesn't have to waste time copying data between RAM and VRAM. Mounting the chips directly on the package may make things a bit faster - or, more likely, saves power. The physical RAM, though, is the exact same stuff that you get on ultraportable PCs that use LPDDR5x (on which it is still soldered direct to the logic board as close to the CPU as possible - a new "modular LPDDR" system is in the pipeline but won't be in the shops for Xmas).
 
Well, one "Key Point" right at the beginning of the article is "Unified memory is a modern memory technology that combines RAM and a hard drive into a single memory pool. This newer approach, favored by Apple Inc., is thought to improve computing efficiency and energy consumption, usually coming at a premium." That's a direct quote. You are telling me that this is a good description of Apple's UMA?
Sorry for my bad article choice, I read through it and it was late here and didn’t catch those things, but looking back over it, you’re right, it’s probably a bad article choice. But there are other articles like the second one I shared with you that make my point better.
 
I gave you another article, just move on. I tried to be helpful to someone else, and all you want to do is keep going on about it. I thought the article was helpful, I didn’t say everything was correct or that it was perfect. 🙄. I read it last night and didn’t catch anything like that in the article, I can look back, but I didn’t notice anything like that in my first read-through. I had a very hard time trying to find a good article that compared UMA and RAM, so sorry if it wasn’t perfect, but it was late here, and I was just trying to be helpful, not give someone a hobby horse to pummel me over. 🙄.
You downvoted me for pointing out that hard drives have no place in a discussion of Apple's UMA. Not sure what to think.
 
Probably true in the case of that particular article. Other articles which may support your point (assuming that you don't think Apple's Unified RAM keeps its content when powered off) are available - that link was just a bad choice (although there's a real dearth of good links).

Examples of questionable/mangled statements in that article (none of which are really being disputed in this thread):


In other words, the linked article is a mish-mash of factoids that confuse Unified Memory/RAM, flash memory, Hybrid Storage and Hierarchical/Tiered storage - which would be entirely consistent with a LLM generating text based on an ambiguous brief like "unified memory". Unfortunately, this is the new reality - people really are using AI to bulk out websites and you have to check every article like a hawk to see that its not superficially plausible AI-generated gibberish.

As for what you are actually saying: just to be clear, the actual RAM in UMA is not built into the CPU die itself, it is in the form of conventional LPDDR memory modules surface-mount soldered to the CPU package - you can see that from the link below which shows the RAM can be upgraded (see below) for a given value of "can" (its not really a practical prospect)


...the main point of UMA is that both the CPU and the integrated GPU share the RAM & that this is implemented differently and more efficiently than older systems with integrated graphics that allocated a chunk of system RAM as video RAM, so the CPU doesn't have to waste time copying data between RAM and VRAM. Mounting the chips directly on the package may make things a bit faster - or, more likely, saves power. The physical RAM, though, is the exact same stuff that you get on ultraportable PCs that use LPDDR5x (on which it is still soldered direct to the logic board as close to the CPU as possible - a new "modular LPDDR" system is in the pipeline but won't be in the shops for Xmas).
I guess looking at the article it does seem to mishmash some things, sorry for my article choice, it was late here and I read through it and didn’t notice that on first read through. But as you pointed out, there are other articles that make the point I was trying to make. And as to your point about the RAM being built into the M chip or not, what I was meaning and saying is it’s built into the package. It’s not etched on the same wafer, I know it’s surface mounted, but it’s part of the same package, that’s what I was talking about. 👍🏻
 
You downvoted me for pointing out that hard drives have no place in a discussion of Apple's UMA. Not sure what to think.
I edited that response because I looked back at the article, and it does in fact say those things. Sorry for my poor choice of article, as I explained, it was late here, and I didn’t catch those things on first read-through. I shouldn’t have dismissed you as quickly as I did, and I shouldn’t have met your valid criticisms of the article with the sarcasm and impugning of your motives. I apologize for my original response, it wasn’t right of me.
 
We've been over this, it runs at the same frequency that memory modules easily achieve, too. Hence the minutely reduced latency does not make an appearance in the actual performance of the soldered memory of an Apple Silicon computer.
Greater distance increases latency, and power required to send high speed signals to the memory. Shorter distance makes it more energy efficient and reduces latency of the connection. Look it up, this is the number one reason cited for locating UMA as close to the CPU and GPU as possible.
 
Greater distance increases latency, and power required to send high speed signals to the memory. Shorter distance makes it more energy efficient and reduces latency of the connection. Look it up, this is the number one reason cited for locating UMA as close to the CPU and GPU as possible.
Let's try to come to an end here. The latency is reduced, but that doesn't translate to higher speed (as shown by the fact that the frequency isn't higher than for memory modules). Being more energy efficient sounds plausible. But the UMA architecture is independent of soldered RAM, from a performance perspective nothing would change if UMA were combined with memory modules. Finally here's a good reason for Apple to use soldered RAM that nobody came up with yet, at least that I saw: The Max chips have a 512bit memory interface. That's not easy to achieve with CAMM modules for a laptop! Some server chips use wider memory with modules, but the chips and the mainboards are massive.
 
Last edited:
Let's try to come to an end here. The latency is reduced, but that doesn't translate to higher speed (as shown by the fact that the frequency isn't higher than for memory modules). Being more energy efficient sounds plausible. But the UMA architecture is independent of soldered RAM, from a performance perspective nothing would change if UMA were combined with memory modules. Finally here's a good reason for Apple to use soldered RAM that nobody came up with yet, at least that I saw: The Max chips have a 512bit memory interface. That's not easy to achieve with CAMM modules for a laptop! Some server chips use wider memory with modules, but the chips and the mainboards are massive.
Yea, the wider memory bus is certainly one of the biggest advantages to it. To my knowledge, LPDDR is actually why Apple started doing it to cut power consumption down several years ago. LPDDR is solder-only and has never had a SO-DIMM format, so any laptop that uses LPDDR has had to solder it to the board.

Having it on the SOC package itself doesn't really benefit latency as much as it's marketed. Electrical signals (assuming they can propagate at half the speed of light) can travel about 1.47 inches every CPU clock cycle at 4GHZ. Contrast that with the RAM access latency of 300+ CPU cycles on these chips (worse than the majority of x86 systems), and it's clear that the proximity of the RAM to the SOC isn't really what makes UMA fast.
 
I just took a look at your post history. Do you ever not complain about Apple?
I've used Apple products since the 90s, and there's a lot I like about their modern products. At a high level-
  • I prefer using iOS and Mac OS over Android and Windows. Wins for their operating systems
  • The build quality is good on their laptops, and they're generally more pleasant devices to use than their windows counterparts.
The biggest annoyances I have with Apple laptops are more recent trends. They've tricked people into thinking that they're an environmentally conscious company, and that soldered components benefit their users. Neither is entirely true, and they've made their laptops unupgradable and disposable. They could last longer, they could be reused by other family members once their primary user upgrades, or when the primary user's needs change they could upgrade the RAM or the storage, which would probably also be faster half a decade down the line. Cheaping out on providing ports and general corner cutting also annoys me, for the price they charge.
 
Last edited:
I've used Apple products since the 90s, and there's a lot I like about their modern products. At a high level-
  • I prefer using iOS and Mac OS over Android and Windows. Wins for their operating systems
  • The build quality is good on their laptops, and they're generally more pleasant devices to use than their windows counterparts.
The biggest annoyances I have with Apple laptops are more recent trends. They've tricked people into thinking that they're an environmentally conscious company, and that soldered components benefit their users. Neither is entirely true, and they've made their laptops unupgradable and disposable. They could last longer, they could be reused by other family members once their primary user upgrades, or when the primary user's needs change they could upgrade the RAM or the storage, which would probably also be faster half a decade down the line. Cheaping out on providing ports and general corner cutting also annoys me, for the price they charge.
Soldered components actually do benefit their customers. LPDDR RAM is a solder-only RAM type, but it has lots of benefits such as being faster and more power efficient. And the Unified Memory configuration makes it even better. It also reduces overall bulk, allowing thinner designs and more space to be used for other components such as bigger batteries. Upgrading RAM is kind of cool and all (I upgraded the RAM in my old Mid 2012 MacBook Pro), but very few people actually take the time to upgrade the RAM in their computer.
 
Soldered components actually do benefit their customers. LPDDR RAM is a solder-only RAM type, but it has lots of benefits such as being faster and more power efficient. And the Unified Memory configuration makes it even better.
The unified memory is a different topic.

Focusing purely on performance... how much performance gain do you think it gives? In what software? Do you think there is more than a 0.5% gain in any usage scenario?
 
The unified memory is a different topic.

Focusing purely on performance... how much performance gain do you think it gives? In what software? Do you think there is more than a 0.5% gain in any usage scenario?
If you mean soldering it to the SOC package, the 0.5% estimate is probably not inaccurate (even that might be an overestimation, but I suppose we don't have actual numbers to compare here). There may be additional factors that increase latency beyond just the distance traveled to the chips (I'm no expert, maybe someone can chime in), but if we're comparing latency improvements that would be caused by the reduced distance of electrical transmission, we're talking about 1 CPU clock cycle for every 1.47 inches of distance saved (assuming electrical transmission at around half the speed of light).

Uncached RAM accesses are already over 300 CPU cycles of latency on Apple Silicon, so it's safe to say that whatever latency might have been saved by cutting the distance is fairly small in comparison.

If, however, we're talking about soldered memory in general (whether to the motherboard or to the SOC), then there are benefits in terms of being able to run more than two memory channels easily (thus higher bandwidth). If they didn't use soldered memory, we'd have to have one SO-DIMM for each channel, and four memory channels would be four sticks of RAM. The Max variation of these chips runs a 512 bit bus, which would be a total of eight channels! That'd be impractical to try to do in a laptop with removable SO-DIMMs, to say the least. 😂
 
If you mean soldering it to the SOC package, the 0.5% estimate is probably not inaccurate (even that might be an overestimation, but I suppose we don't have actual numbers to compare here). There may be additional factors that increase latency beyond just the distance traveled to the chips (I'm no expert, maybe someone can chime in), but if we're comparing latency improvements that would be caused by the reduced distance of electrical transmission, we're talking about 1 CPU clock cycle for every 1.47 inches of distance saved (assuming electrical transmission at around half the speed of light).

Uncached RAM accesses are already over 300 CPU cycles of latency on Apple Silicon, so it's safe to say that whatever latency might have been saved by cutting the distance is fairly small in comparison.

If, however, we're talking about soldered memory in general (whether to the motherboard or to the SOC), then there are benefits in terms of being able to run more than two memory channels easily (thus higher bandwidth). If they didn't use soldered memory, we'd have to have one SO-DIMM for each channel, and four memory channels would be four sticks of RAM. The Max variation of these chips runs a 512 bit bus, which would be a total of eight channels! That'd be impractical to try to do in a laptop with removable SO-DIMMs, to say the least. 😂
Out of curiosity, which CPU tasks benefit from extreme RAM bandwidth? I know Nvidea manage a lot more bandwidth on their new cards than Apple manage on the Max, but that's for graphical tasks.
 
Out of curiosity, which CPU tasks benefit from extreme RAM bandwidth? I know Nvidea manage a lot more bandwidth on their new cards than Apple manage on the Max, but that's for graphical tasks.
I could be wrong, but as I understand it, in terms of soldered memory, lots of heavier tasks such as rendering will benefit from higher RAM bandwidth. As far as I’m aware, pretty much anything that needs more RAM will benefit from the higher speed RAM. And because it is Unified Memory, it will also have sizeable benefits for graphics heavy tasks as well. And generally speaking, to be obvious, everything on the computer will benefit from the higher speed performance, everything will be snappier, even light tasks. You may argue over how “snappy” the system should be or needs to be, but it is definitely faster and more energy efficient than SO-DIMM cards.
 
Last edited:
Out of curiosity, which CPU tasks benefit from extreme RAM bandwidth? I know Nvidea manage a lot more bandwidth on their new cards than Apple manage on the Max, but that's for graphical tasks.
On this, I am no expert. I do know Anandtech tried to max out the M1 Max's memory bandwidth a few years ago and could not figure out a CPU workload to do it, although they apparently did manage to get pretty close.

From what I understand, it's primarily the GPU-intensive workloads that are much more likely to max out such a large amount of memory bandwidth, but this might be a better question for @name99
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.
Back
Top