Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I'm a little upset that I just spent $4700+tax on a 15" MBP, but I can console myself with the knowledge that I can still at some point get an eGPU if I need the extra graphics power.

I'm astonished that the new Vega graphics will be 60% faster than the 560X - can anyone explain how this is suddenly possible despite the strict power and thermal limits in the MBP?
 
Very useful info. Thanks! Also, I would to know about Cuda cores and Steam processors. They're essentially unified shaders, but I have read they cannot be compared apples to apples. I read on enthusiast websites that Nvidias shaders are larger, more complex, so they do more per cycle. So if the AMD card has 5,000 shaders, the nVidia card can achieve the same with a lot less shaders, is this correct? Someone did some conversion with Maxwell and said for every Cuda core it takes like 2.25 steam processors.

This is were things get unnecessarily complex, simply because of the marketing BS. All of the modern GPUs are essentially multiprocessor units. An Nvidia pascal GPU is built from dual-core processors (what they cal SM), with two 256-bit vector ALUs per core. Similarly, AMD Vega is built from NZUs, which are processors with either 2 256-bit ALUs or 4 128-bit ones (the references I had on this were not clear). Blocks of these processors are then bundled together in groups sharing memory/cache/texture units access. If the CPUs were marketed the same way, then the Coffee Lake in my computer would be a "96 high performance FMA compute cores + 160 integer cores" or similar nonsense.

I have no idea if the capabilities of the ALUs is exactly the same these days, I assume it is very similar at least. Vega boasts with this thing they call "rapid packed math", which essentially means that its ALUs can process vectors of different types without performance penalty — something that Intel GPUs for example have supported for ages. If I understand correctly Pascal has the same capability, so here is probably no difference here. At any rate, a "core" (meaning one ALU lane) on both Vega and Pascal is capable of 2 32-bit floating point operation per second, but that number is a bit of a hoax, since it simply reflects the fact that they can do an addition and a multiplication in a single cycle (fused multiply-add or FMA). If all you do is additions or multiplications, its only 1 FLOPS.

What is quite important with GPUs though is the granularity of execution. The thing is, every ALU can do only one thing at a time. You can't have a part of an ALU adding two numbers and another part multiplying them. These are systems that are designed to process a lot of data at the same time using the same operation. This is great for graphics, where you usually want to apply the same operation in parallel (say, blend two images together). But this is also were the granularity matters. A GPU might have very fast 1024-bit ALUs capable of processing 32 FP numbers at the same time, but if you don't have that many numbers to process, parts of the ALU simply won't do anything useful. If you want to add 16 numbers and then subtract 16 additional numbers, a 1024-bit ALU would need two passes (two cycles). Two 512-bit ALUs instead will need only one cycle.

From what I gather, Vega (and also Intel GPUs) are more flexible in how they do scheduling compared Nvidia's architecture, which in turn allows them to pack work on multiple smaller, complex tasks more efficiently. This doesn't matter much for games or any other work where you are basically processing large arrays in parallel, but it makes a difference when you need to run less predictable code. This is also why Vega is performing so well in complex compute tasks such as raytracing. But I admit that my knowledge on the topic is very basic.
[doublepost=1541513694][/doublepost]
I'm astonished that the new Vega graphics will be 60% faster than the 560X - can anyone explain how this is suddenly possible despite the strict power and thermal limits in the MBP?

We had some discussion on this on the previous pages. Essentially, a more efficient new architecture + HBM2 RAM that consumes significantly less power compared to GDDR5 while offering much higher bandwidth. The later in turn means that more the GPU can be clocked higher to "compensate" those power savings.
 
I have no idea if the capabilities of the ALUs is exactly the same these days, I assume it is very similar at least. Vega boasts with this thing they call "rapid packed math", which essentially means that its ALUs can process vectors of different types without performance penalty — something that Intel GPUs for example have supported for ages. If I understand correctly Pascal has the same capability, so here is probably no difference here. At any rate, a "core" (meaning one ALU lane) on both Vega and Pascal is capable of 2 32-bit floating point operation per second, but that number is a bit of a hoax, since it simply reflects the fact that they can do an addition and a multiplication in a single cycle (fused multiply-add or FMA). If all you do is additions or multiplications, its only 1 FLOPS.
In the first paragraph you said about Marketing. And here is where it takes off...

Consumer Pascal GPUs have only compatibility level of FP16, compared to... Quadro GP100 chip. GP100 has 2:1 ratio of FP16 on FP32 cores. GP102, GP104, GP106, GP107 - all of them have 1:1 ratio of FP16 on FP32 cores ;).
What is quite important with GPUs though is the granularity of execution. The thing is, every ALU can do only one thing at a time. You can't have a part of an ALU adding two numbers and another part multiplying them. These are systems that are designed to process a lot of data at the same time using the same operation. This is great for graphics, where you usually want to apply the same operation in parallel (say, blend two images together). But this is also were the granularity matters. A GPU might have very fast 1024-bit ALUs capable of processing 32 FP numbers at the same time, but if you don't have that many numbers to process, parts of the ALU simply won't do anything useful. If you want to add 16 numbers and then subtract 16 additional numbers, a 1024-bit ALU would need two passes (two cycles). Two 512-bit ALUs instead will need only one cycle.
http://www.freepatentsonline.com/20180121386.pdf

This is what this patent tries to solve ;). Not feeding ALUs with enough work.
From what I gather, Vega (and also Intel GPUs) are more flexible in how they do scheduling compared Nvidia's architecture, which in turn allows them to pack work on multiple smaller, complex tasks more efficiently. This doesn't matter much for games or any other work where you are basically processing large arrays in parallel, but it makes a difference when you need to run less predictable code. This is also why Vega is performing so well in complex compute tasks such as raytracing. But I admit that my knowledge on the topic is very basic.
Nvidia GPUs pre-Turing/Volta have software scheduling. Turing has full hardware scheduling, which is why Asyncronous compute finally works on Nvidia GPUs.
 
  • Like
Reactions: leman
Any ideas how much apple will be charging for these two graphics cards in the UK?
 
Consumer Pascal GPUs have only compatibility level of FP16, compared to... Quadro GP100 chip. GP100 has 2:1 ratio of FP16 on FP32 cores. GP102, GP104, GP106, GP107 - all of them have 1:1 ratio of FP16 on FP32 cores ;).

There is a nice article on AnandTech about this, GP104 has actualy only 1:128 ratio of fp16:fp32 cores, Nvidia disabled fp16 cores in consumer 1080, to force people to buy Quadro products. 980 is 64x faster than 1080 in synthetic fp16.

And it shows on Macs also, when I was looking for GPU upgrade benchmarks were showing 1080 lagging behind 980 in compute tasks.

https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5
 
There is a nice article on AnandTech about this, GP104 has actualy only 1:128 ratio of fp16:fp32 cores, Nvidia disabled fp16 cores in consumer 1080, to force people to buy Quadro products. 980 is 64x faster than 1080 in synthetic fp16.

And it shows on Macs also, when I was looking for GPU upgrade benchmarks were showing 1080 lagging behind 980 in compute tasks.

https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5
No, they have not disabled the cores ;).

Architecture high-level layout for GP102, GP104, GP106, GP107 is THE SAME as it is for Maxwell cards. Consumer Pascal GPus were just Maxwell cards on 14 nm process, with very high core clocks. Turing is Volta on compute front, and new graphics capabilities, which means it first new GPU architecture since Maxwell, for Nvidia.
 
Vega 12 clocked at 1.3 GHz is around 64% faster than Radeon Pro 560X. In graphics tests it is between 57 and 72% faster.

Here is direct comparison between the two: https://www.3dmark.com/compare/3dm11/12875524/3dm11/12886777#

The GPU itself appears to be slightly slower than GTX 1060 Max-Q, and on par with Vega M GH.

Pretty nice computer you will have there: 85W, 6 core/12 Thread CPU, 16 GB of RAM, GTX 1060 Max-Q levels of performance. Not bad at all.

I really, really want this GPU on desktop, mainstream market, tho.
https://www.3dmark.com/compare/3dm11/12875524/3dm11/12979937#
Here is comparison between Vega 12 and desktop version of RX 560, while using the same CPU, as Vega platform.

Vega 12 is 40% faster. Which means, it will be 20-25% faster than GTX 1050 Ti. Right on par with Vega M GH/GTX 1060 Max-Q :).
 
https://www.3dmark.com/compare/3dm11/12875524/3dm11/12979937#
Here is comparison between Vega 12 and desktop version of RX 560, while using the same CPU, as Vega platform.

Vega 12 is 40% faster. Which means, it will be 20-25% faster than GTX 1050 Ti. Right on par with Vega M GH/GTX 1060 Max-Q :).


That benchmark is rather fishy. The driver identifies as a Vega 20 and the benchmark was done using a desktop Core i3 4150. Why are engineers using a low end 2 core CPU to benchmark when clearly these things are going to be paired with a 4 core 7700hq, or higher?
 
No, they have not disabled the cores ;).

Architecture high-level layout for GP102, GP104, GP106, GP107 is THE SAME as it is for Maxwell cards. Consumer Pascal GPus were just Maxwell cards on 14 nm process, with very high core clocks. Turing is Volta on compute front, and new graphics capabilities, which means it first new GPU architecture since Maxwell, for Nvidia.

Ok ;) "didn't implement native fp16 on hardware level like they did on GP100". I didn't notice that he was comparing the fp16 throughput between promoted fp16 on 980 to native on 1080, and you're right - they are mostly the same, they use the same fp32 cores on consumer Pascals and just added 1 fp16 core for every 128 of old fp32 cores. My bad - interesting article nevertheless.

We had some discussion on this on the previous pages. Essentially, a more efficient new architecture + HBM2 RAM that consumes significantly less power compared to GDDR5 while offering much higher bandwidth. The later in turn means that more the GPU can be clocked higher to "compensate" those power savings.

Here is the thing - the problem is with dissipating the additional heat, the VRAM chips are outside of the GPU and not even attached to heatsink. So switching to HBM2 will reduce power consumption/extend battery life etc but won't have that much impact on GPU thermals, unless the memory controller also produces significantly less heat. Looking at the raw power requirements one could expect giagantic improvements, I mean a mere 4~5W more between 555x and 560x gives 20% more performance, but since the VRAM doesn't really directly increase the heat produced by the GPU core, the thermal efficiency will be limited to Vega/Polaris efficiency ratio, which in case of the big ones, is like 10% improvement.
[doublepost=1541522259][/doublepost]
That benchmark is rather fishy. The driver identifies as a Vega 20 and the benchmark was done using a desktop Core i3 4150. Why are engineers using a low end 2 core CPU to benchmark when clearly these things are going to be paired with a 4 core 7700hq, or higher?
That's what I think also, that's a 75W TDP compared to alleged 35W. With 35W chip being 40% faster. I'll let @koyoot choose my profile avatar for a month if that's true ;)
 
  • Like
Reactions: darksithpro
That benchmark is rather fishy. The driver identifies as a Vega 20 and the benchmark was done using a desktop Core i3 4150. Why are engineers using a low end 2 core CPU to benchmark when clearly these things are going to be paired with a 4 core 7700hq, or higher?
Because it may end up on desktop.

I would happily pay 150$ for 35W, fanless Vega GPU, that would be 20-25% faster than GTX 1050 Ti.
That's what I think also, that's a 75W TDP compared to alleged 35W. With 35W chip being 40% faster. I'll let @koyoot choose my profile avatar for a month if that's true ;)
It is 35W. If it would be 75W - it would be faster. Much faster. At least 30%, from 35W TDP Vega 12.
 
Last edited:
I would happily pay 150$ for 35W, fanless Vega GPU, that would be 20-25% faster than GTX 1050 Ti.

And that's the problem with an engineers train of thought vs a gamer on a budget. A gamer is not going to want an efficient fanless GPU, that's gonna be pared with an entry level Core i3, sold at Walmart/Target for 500 bucks with a gamer badge sticker slapped on it. Just a terrible idea IMO.
 
And that's the problem with an engineers train of thought vs a gamer on a budget. A gamer is not going to want an efficient fanless GPU, that's gonna be pared with an entry level Core i3, sold at Walmart/Target for 500 bucks with a gamer badge sticker slapped on it. Just a terrible idea IMO.
And who said, that is the config you will get? Who said that is the Model/variant you will get on desktop?
 
And who said, that is the config you will get? Who said that is the Model/variant you will get on desktop?


What would be the configuration then? You know as well as I do most people will choose the nVidia card over AMD if they're at the same price point with similar performance. Go look at the steam hardware stats...
 
What would be the configuration then? You know as well as I do most people will choose the nVidia card over AMD if they're at the same price point with similar performance. Go look at the steam hardware stats...
I don't know. I just have spoken what I would happily buy, for 150$, especially if it would be fanless(I am building fanless gaming desktop right now).

P.S. 35W TDP GPU, 20-25% faster than GTX 1050 Ti is pretty much unbeatable by anything in terms of performance/watt on 14 nm process ;). So it would make quite a lot of stir ;).
 
Last edited:
  • Like
Reactions: darksithpro
P.S. 35W TDP GPU, 20-25% faster than GTX 1050 Ti is pretty much unbeatable by anything in terms of performance/watt on 14 nm process ;). So it would make quite a lot of stir ;).


In terms of efficiency I agree. But gamers with desktops don't care about that. That's why we have 700+ watt power supplies. My GPU even lights up with different colors and I have a remote control to change the colors too. Does your GPU have led lighting and change colors?
 
In terms of efficiency I agree. But gamers with desktops don't care about that. That's why we have 700+ watt power supplies. My GPU even lights up with different colors and I have a remote control to change the colors too. Does your GPU have led lighting and change colors?
No. Because I don't care about RGB lighning, and I am a gamer. I don't care about flashiness.

All I care is 144 Hz refresh rate, FreeSync, and fanless operation.
 
No. Because I don't care about RGB lighning, and I am a gamer. I don't care about flashiness.

All I care is 144 Hz refresh rate, FreeSync, and fanless operation.

Makes perfect sense coming from an engineer, but the younger generation buying those products do care about RGB lighting, the bling bling effect. It just sounds so boring.:oops:
 
Makes perfect sense coming from an engineer, but the younger generation buying those products do care about RGB lighting, the bling bling effect. It just sounds so boring.:oops:
Not really. 80% of market is mainstream. A lot of people are playing using Ryzen 5 2400G APU, without a dGPU.

Only those who can afford GPUs are burning money on them high-end.
 
Makes perfect sense coming from an engineer, but the younger generation buying those products do care about RGB lighting, the bling bling effect. It just sounds so boring.:oops:
My 4 year old threw a fit yesterday, he didn't want to play on Gigabyte P34, i7-4770HQ+860m FHD IPS, but wanted to go back to Asus G1s, Core 2 Duo + 9500gt with low res TN... Asus has green light strips on the side.
 
  • Like
Reactions: darksithpro
My 4 year old threw a fit yesterday, he didn't want to play on Gigabyte P34, i7-4770HQ+860m FHD IPS, but wanted to go back to Asus G1s, Core 2 Duo + 9500gt with low res TN... Asus has green light strips on the side.


I took my MSI laptop to work a while back, had the keyboard rotating with different colors. People kept coming up with that "WOW" look asking about it. They thought it was a four, or five thousand dollar computer. It's amazing how much more appealing you can make a product appear with some cheap led lighting. The Dragon icon lit in the back made for some interested looks.
 
We had some discussion on this on the previous pages. Essentially, a more efficient new architecture + HBM2 RAM that consumes significantly less power compared to GDDR5 while offering much higher bandwidth. The later in turn means that more the GPU can be clocked higher to "compensate" those power savings.

Do you foresee more improvements like this in the years ahead? e.g. 2/3 years from now, will MBP GPUs be another 100%+ more power efficient, or will progress get stuck like it did with Intel CPUs?
 
Do you foresee more improvements like this in the years ahead? e.g. 2/3 years from now, will MBP GPUs be another 100%+ more power efficient, or will progress get stuck like it did with Intel CPUs?
LOL.

Lets see. 7 nm process, from AMD allows them to pack 2304 GCN core chip in a 100mm2, and in 75W TDP - clock it to 2 GHz. At that clock speed we are looking at a GPU as fast, or slightly faster than... GTX 1080.
 
Is the up coming VEGA nMacbook Pro a good choice compared to a 2017 Imac Pro 8 or 10 core. I have been putting off up grading due to the high cost of screwing up. My reciently upgraded CAD software is running very slow on my 2010 6-core cMP. I have been looking at the latest GEEK bench scores and new MacBooks are right at the top. The Apple 14 day return policy is ok but at 5 to 8 thousand $$$$ it’s alot of money to invest in NON UPGRADABLE hardware. Apple is free to charge whatever they want but they should give us all a little more notice when they are making our stuff old fast. A $5500 MBP has to go for a while with out needing replacement.
 
Wouldn't a eGPU be a more cost effective solution? Using a box like the Razor Core X with the card in? I know its not portable, but it will allow greater upgrade opportunity in the future, not to mention cross compatibility to Windows. I suppose if you want that built into the MacBook Pro then its going to cost lots of cash, but I would worry about the heat being generated and how the MacBook Pro handles it, for example will it thermally throttle the card?
 
  • Like
Reactions: Queen6
LOL.

Lets see. 7 nm process, from AMD allows them to pack 2304 GCN core chip in a 100mm2, and in 75W TDP - clock it to 2 GHz. At that clock speed we are looking at a GPU as fast, or slightly faster than... GTX 1080.

I'm not clued up enough on this topic in order to be able to interpret your answer. How much faster, compared to the GPU in the upcoming MBP, would you say mobile GPUs will get over the next 2-4 years?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.