Seems to me there are two things related to RAM going on here. UMA and in-package RAM. People use “UMA” to refer to both, but I think of them as two different issues.
With respect to the in-package RAM, it is true that the advantage there comes when you have an L2 cache miss. You add around 6ps/mm of latency when your signals have to travel to distant RAM (Plus you have to use bigger drivers, that take more power and get hotter). You typically get around that by using more cache (or more cache levels. This smart guy has something to say about all this: https://www.ecse.rpi.edu/frisc/theses/MaierThesis/ )
The other advantage of UMA is that it avoids time- (and power-) consuming memory transfers. If you have the CPU calculating information that the GPU needs to see, it can just write it into the shared memory, and there is no need to copy it from CPU memory to GPU memory over a bus.
Note that the information the GPU and CPU share may never even make it into the RAM in the package - it may be entirely within the caches (depending on how much information there is and how much other stuff is going on).
Yeah, that's why the first time I tried to explain it, I referred to it as "in package memory"-- but the distinction seemed lost on the audience. Being in package, I believe, would have only a minor impact on the tight loops and minimal data sets in the benchmarks I referenced. Based on the Geekbench reference data from the i5-6400, there are too few misses to make that kind of difference. The i9 and Xeon have significantly more cache than the i5-6400, so would suffer even fewer misses.
I can't see how the unified memory architecture itself would have any impact on the single core CPU benchmark. Best I can tell, the benchmarks are based on external libraries (not Apple's APIs) and aren't making use of the coprocessors. There's a separate Geekbench Compute Workload that would exercise the GPU and the rest of the system.
From what I can tell, the 50% performance advantage the Firestorm core has over the i9 and Xeon is mostly due to the CPU itself.