Your argument is based on pure sample theory. But misses a few things.
Actually, he's pretty much spot on, IMO.
1) Dithering. It's common whaen matering a CD to record a series of samples like "11, 12, 11, 12, 11, 12" so as to have the effect of doing "11.5, 11.5 11.5" with a higher sample rate ditering works better. So the high sample rate can actually improve the effective bit depth
Any format that offers 96kHz or 192kHz sampling rates also typically offers greater than 16-bit word lengths. Achieving real-world dynamic range much beyond 20-bit at volume levels the human ear can tolerate is pretty much impossible, so why would you ever need greater "effective bit depth" than what's already possible using word length? Dithering does need time to affect perceptible changes in the noise floor, but I don't see how increasing the sampling rate beyond 44.1kHz even at 16-bits would be any more effective given the white noise itself is also encoded at 16-bits. Sony created Super-Bit Mapping (noise-shaping dither) many years ago and I've seen evidence of real-world increases in apparent reduction of the noise floor in critical human perception areas of perhaps 3dB or so maximum (going by memory), which isn't going to replace 20-bit words any time soon, but might be worthwhile in some circumstances (it didn't use higher sampling rates, though). It also shifted the noise to another frequency range (i.e it didn't actually increase overall dynamic range, but relied on human perception/audibility at one frequency range versus another to given apparent dynamic range improvements.
2) No one listens to pure samples.
WTF is a "pure sample" ???
A sample is a sample, regardless of whether it contains a square wave or a sin wave or a tiny sample/part of a much more complex signal.
In most cases (except for a few well informed audiophiles) they will apply a digital volume control, basically a multiplier or shifter. 24 bits of samples in a way makes up for the bits "lost" in processing due to integer math round off error that occurs between most CDs and thr D/A converter. This is the same reason engineers like to record in 24-bit, it allows for loss in mixing
Who are 'they'? If you're worried about the effects of a digital volume control on the lowest bit, use an analog pot instead. But as you turn down the volume, you need less dynamic range to represent the signal in question so it's pretty much tit-for-tat other than the noise floor of the DAC itself.
In other words, digital volume controls aren't nearly as evil as some have made them out to be in certain "high-end" circles. In any case, my 2-channel listening room has a straight analog pre-amp (with motorized volume control), a custom active crossover and two amplifiers connected to Carver Ribbon speakers (AL-III), which are the same ribbons used in the $50,000 per pair Genesis II speakers (albeit additional drivers and different woofer). So it's not like I'm listening to cheap speakers from Radio Shack here.
But I agree, over all few people will notice the improvement, CDs can be very good. Noticing the different requires both good equipment (that most don't own) and years of experience with listening carefully. Man musicians can hear this well some in the recording industry can but must consumers lack interest in developing such skill and it absolutely does not come without effort
You can listen carefully all you want or even delude yourself all you want (an all-too-common thing in the high-end audio circles, IMO), but it won't change an opinion into a fact. Double blind tests (the only real proof of perception differences) always seem to expose snake-oil and high-end magazine claims (that seem to benefit valuable advertisers more than anyone else).
However, quite a few people made good money selling overpriced (I think they went for $25-30 each) green marker pens to outline the outer edge of their CDs, which somehow magically made things sound better (supposed jitter reduction despite the fact it could not be measured in any way) among many other devices of dubious value. In other words, there's all kinds of "snake-oil" in the high-end audio arena. I haven't read magazines like Stereophile (and worse yet, Stereo Review, which I think was changed to a home theater rag) in a decade or so, but they perpetuated it by giving credence to nonsense and junk science in order to sell advertising to the people that make such crap. They didn't do the readers any real favors in the process. People that could have put their money into better speakers or room treatments put it into multi-thousand dollar DACs and transports instead which usually had less than 0.01dB in difference from a $25 DAC (i.e. virtually inaudible and certainly at least a magnitude less than a typical loudspeaker would create over a given frequency range).
Oh and then there are experiments where people have been found to be able to hear 40K and higher.
Care to quote some links to these experiments? I've found some evidence of the inner ear being able to hear ultrasonic frequencies, but not the outer ear.
Not directly but lets say there are two instruments and one has an overtone (harmonic) at 40K and the other at 42K. You should expect to hear a beat frequency at 2K. I think this is one of the ways we locate the source of sounds by hearing the beats between ultrasonic harmonics. If so it is best to record these sounds. try it yourself with a pair of audio signal generators. Not psuedo science there is math to back it up
A beat frequency requires a non-linear medium to create it. If that medium is in the room where the sound is being recorded, it will be picked up within the bandwidth constraints of audible hearing (and it might not be a good thing in some cases, such as unwanted distortion off some kind of treated walls of the room that weren't meant to be recorded in the first place).
Now if that non-linear medium is supposedly the ear drum itself, it seems at least theoretically possible that a beat frequency might be missed by the brain with the recording that might exist in the real world (with say violins that would produce many such harmonics). I've never seen any really convincing evidence that such a thing is audible in practical situations. At best, it would be hard to discern from other distortions in the playback chain compared to a real event in the room. Most microphones will not even record linear above 25kHz, so getting a signal source to even compare might prove very difficult. I've certainly never read of any actual acoustic content above 40kHz on something like an SACD recording. If the source of sound is artificial, there is no basis at all for comparison to a real world event.
What I'm saying is that evidence for any reproduction benefit of information above 20kHz is controversial at best and considered non-existent at worst. My Carver AL-III Ribbons begin with roll-off above 17kHz and pretty much don't output anything significant above 22kHz anyway so it wouldn't do me much good to test any purported material anyway. My PSB speakers in my home theater room are good to perhaps 27kHz. Short of adding an ultra-sonic tweeter, they're not going to produce much ultrasonic energy to even test it. Most speakers have similar limits (give or take) and thus the question of high sampling rates becomes moot once again. At the very least, I can be reasonably certain that the industry's use of such rates had nothing to do with the average reproduction chain (i.e. what's considered when mastering a typical audio recording). Certainly, I don't think ultrasonic reproduction is the first thing even high-end audio magazines (snake-oil orientated or not) look at when considering what is a great sounding speaker and yet I've seen many put emphasis on the 192kHz sampling rate even though it's pointless compared to 96kHz (which is pretty pointless compared to 48kHz as well, but 192 is just absurd; show me ANY speaker of merit that can reproduce 96kHz and then show me a signal that has any musical information at that frequency to reproduce in the first place).
In any case, I'm not worried about such purported content given the lack of sources, the lack of playback equipment (that I own) and the fact I'm not missing anything.
Ultimately, I still maintain that if Apple is considering 20-bit or 24-bit encoded files in the future that their reason is to re-sell the files based on the ignorance of most people as to the supposed benefit of such a change, not any actual sound quality differences. In fact, I can imagine them re-encoding 16-bit sources (i.e. no content above 16-bit present in the final file) and still giving it a 24-bit label. The key would be in the wording of the sale.
Lossless encoding (regardless of bit/rate) would be a far more substantial offering, IMO, especially given many CDs cost about the same as iTunes compressed offerings and also give you a re-saleable medium (no signature watermarks, regardless of DRM), artwork, etc.