Folks are upset at the 15-20% RT gains over the 40-Series. Expecting the gains to be larger.
RT performance is anywhere from ~15% (i.e., “70” tier) to ~66% (i.e., “90” tier) better.
Tensor core (“AI”) performance is up to ~2.8x faster. The biggest uplift is actually at the “70” tier. The “90” class is ~2.54x higher.
At least according to
Nvidia's numbers
The shader (CUDA) cores (i.e., floating point) performance took some doing. Nvidia still hasn’t published those values, simply stating “Blackwell” for “Shader Cores.” Fortunately, the
TechPowerUp GPU database has been updated.
5090 vs. 4090 = +~27%
5080 vs. 4080 S = +~8%
5070 Ti vs. 4070 Ti S (AD102) = +~1%
5070 vs. 4070 S = -~13%
--- 5070 vs. 4070 (AD104) = +~6%
Their FP performance calculations are based on Nvidia’s promised core boost frequencies, which isn’t the wrong approach. But, as many of us know, the cards will push beyond those clock speeds in the vast majority of scenarios. And I wanted to compare the (closer to) actual (a.k.a. “real world”) performance difference. Of course, we don’t yet know any of the real world frequency (and power consumption) values for the 50 series — benchmark embargoes lift starting Jan. 24. However, I was able to find the/a formula for FP performance.
s x 2 x c = g or s2c = g (GFLOPs)
g / 1000 = t (TFLOPs)
s = shader/CUDA core count
c = core (boost) frequency in MHz
I can't recall what the 2 represents.
For example: In my observations, the
RTX 4080 averages 2.8GHz core frequency vs. the 2.51GHz
promised by Nvidia.
So, the default calculation is 9728 x 2 x 2.51 / 1000 = ~48.8 (TFLOPs)
My RTX 4080 cards seemingly average up to ~54.5 = 9728 x 2 x 2.8 / 1000
One more thing...
It was soon obvious this generation wasn’t going to be a huge uplift, generally speaking, because Nvidia wasn’t able to substantially change the core frequency. From 30 to 40 series, the core frequencies were ~50% greater across the board, which is a big improvement in itself, obviously.