There is no noticeable difference between the 44.1kHz (we mostly listen to now) and 96kHz.
The human ear can't even hear above 20kHz, and the rest is just for filter purposes.
That's not what Shannon's theorem says. It doesn't state that sampling at twice the frequency is enough to be able to reproduce a perfect signal. What it states is that if you sample under twice the frequency, there is no way you will be able to reproduce a signal at that frequency... It's a minimum, not a maximum.
For instance, a young person will hear a signal at 20kHz. To capture that signal, you need to sample at
at least 40kHz. Then, that young person will be able to hear
something, but you will have lost a lot of characteristics of the signal - for instance, you will not be able to know if the original signal was a sawtooth, a square or a sinusoid. So, significant information will have been lost.
That's why CD recordings sounded metallic at first. The solution, which is applied on all CDs, was to cut the frequency around 16kHz to avoid the destruction of the characteristics of the signal around Shannon frequencies.
That's why 96kHz is interesting, because it keeps quality in the upper part of the spectrum.
Moreover, 24-96 is not only about 96kHz, it's also 24 bit. And there, you gain a lot. The problem with CD and digital capture in general is that the scale is linear while most of our senses use a logarithmic scale.
The result is that when you go at the bottom of your intensity, you have a very very low resolution in your sample, while the human ear (or eye) still have a good resolution. This is especially visible in photography: if you brighten the shadows, you will see a lot of banding, because the sample resolution is very low in the shadows. It's the same problem with audio: CD killed the dynamic range (hence the loudness war), because it's not that good when you have a lot of dynamic during the low volume ports.