You're trying to use a very limited "model" of the real world to "prove" what is or isn't possible in the real world. Doesn't work that way.
Your "model" is a simply a crude version of the real world. For example: Do you actually think you could use your "model" to accurately and completely reproduce the sound of even a simple musical instrument, such as a flute playing any sort of musical piece?
One of the advantages of a digital system is that a tremendous amount of data can be recorded in a very short span of time. What you seem to be missing here is that your eardrums are only singular membranes... they are not multiple membranes with individual sensitivities. Everything you perceive is through this fairly limited system that can really tell you only two things... frequency and amplitude. The ears themselves don't inherently record how many instruments there might be oscillating that create the complex conglomeration of sounds you might be hearing at any one given moment in time... So how is a digital recording system any different?
The digital system also encodes amplitude and frequency of whatever source it picks up, just as the ear... From there, a continuous analog signal of great fidelity isn't that hard to reproduce. What's hard is putting together all the necessary technology in the right order to make it happen... but there are people who do these things for us, called electrical engineers.
There are idiosyncrasies of a conventional analogue system which impede its fidelity, such as an elevated noise floor. The noise floor in a digital system is fractions of this, and additional techniques such as dithering can allow a system to actually reproduce amplitude values below the noise floor!
Back to a point a few paragraphs up... I said that there are only really two pieces of information your ear transmits to the brain. Interestingly from these two pieces of information, the brain deduces things like vector and distance, but this is an incidental evolutionary advantage because the pinna of the ear and your stereoscopic hearing create minute differences in pitch and phase between the two signals received by the ears. But again, what's important to note is the sound as it's picked up by the eardrum has ONLY two characteristics... and as long as a sound reproduction system can reproduce these two characteristics correctly, then everything else that arises from these two is inherently reproduced and faithfully so. (One cannot always control the acoustic environment in which playback takes place so there will always be impeding factors to accurate IMAGING of the sound, but this is a limitation inherent to the playback environment and not the digital or analogue medium.)
There aren't any magical "presence" or "tonality" messages in a soundwave... presence, like tonality, pitch, timbre, vector, etc. are all determined by amplitude and frequency, the only two characteristics any propagated soundwave inherently possesses.
Well, engineers have reproduced that too. Engineers have developed digital signal processing methods built on these psychoacoustic principles which are perfectly capable of fooling the human brain into perceiving multiple sounds coming from locations other than the speakers from which they're really emanating. Again nothing extremely fancy in terms of the concept is being done... just alterations to frequency and amplitude which produces the phase characteristics that make sounds appear to be projecting from different locations... and not merely one complex sound at a time, but many complex sounds at a time.
If, as you seem to suggest, no engineers knew how to reproduce sounds accurately in the digital world then none of this would be possible.