Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Ken Rockwell - LOL! :)

Well known maybe....but that's about it.

Let me ask you just two more questions here.

IF 24-bit audio is technically redundant (as you so claim)...then WHY do we record that way in the studio?

and, why do so many people say that they can hear the difference between 16/44 and 24/96?

KEV

See here. http://www.soundonsound.com/sos/jun08/articles/qa0608_2.htm

In short, it allows for headroom during the recording process.

As for why people think they can hear the difference? The number one reason is that most '24 bit' audio people hear has been mastered differently than the CD's they are comparing to. The recordings are literally different. Take the exact same masters, one at 16bit and one at 24bit and there will be no difference because there can't be. 16bits can easily encode the entire range of human hearing, let alone the best recordings. Extra room is just that. Extra.

Also see this article http://sound.stackexchange.com/ques...room-vs-resolution-with-24bit-audio-recording

Specifically the following section:
Digital Audio is not Digital Photography


I would have assumed this to work in a similar way as digital images.

The analogy between the two is intuitive, in fact it would be weird if people didn't make it. But digital audio and photography are not the same, not in the core at least. In fact, the analogy to camera sensors is a prime reason to why people fail to grasp digital audio.
 
It's another essay! Depends on what you're recording...

Ken Rockwell - LOL! :)

Well known maybe....but that's about it.

Let me ask you just two more questions here.

IF 24-bit audio is technically redundant (as you so claim)...then WHY do we record that way in the studio?

and, why do so many people say that they can hear the difference between 16/44 and 24/96?

KEV

We just want you to have the audio equivalent of Better Graphics. It's mostly about filters and "linearity" and stuff most listeners don't want to be bored with. So I'll just be 80% boring:

The quiet passages reveal the limits of 16 bit audio, but if a good master is properly dithered, you can "trick" the converters into tricking the listener into PERCEIVING a greater dynamic range in the recording:
It's all about this little thing called noise-shaped dither.

And because the noise floors are dropping, and the mics, pre's, converters, and mixers have been getting cleaner and more linear. In a session last summer, my drummer has a groove SPL of 122dB-unweighted, which is insanely loud but sounds gorgeous going through fancy condenser mics and into a tube preamp. All this racket is recorded in one giant room, like the olden days, thanks to my multi-SSD equipped Mini and a decent Pro Tools (or Logic) system. All without audible hiss. To hear the hiss, you would have to turn up the headphones and/or monitors so loud, you would suffer permanent hearing damage. But on every track, all that "noise floor" sums into one stereo master bus.

Most people's noise floor will never let them enjoy it, but I err on the side of quality. The listening environment matters the most, though balanced XLR/TRS connections into good monitors matter, too. I wish everybody would just buy small powered, near-field monitors. They're not that expensive anymore, and you can find used pairs all the time at that big music store in your area.

24/88.2, 24/96, and 24/192 is really great for tracking sounds that aren't made up of 16/44.1, 8/22, or mp3 samples. When I track rock, I use 24/44.1 because my system will only allow 16 simultaneous tracks of 24/44.1 (I could use 24/48, but that's another story.)

I use higher sample rates when I'm making an important recording. (Live classical music, rented TLM49's, a Fazioli F278, etc.)

When I'm editing a big mix of 32 tracks and 4 stereo effects busses, that dynamic range can start to get pretty tight on the master fader, so having a 24 or 32 bit mix bus is important if I'm going to give the listener any dynamic range. Sure, I can blast out a Gaga/Swift (who I love) type of mix that goes from only -6dBFS to 0.0dBFS inside of every track, but that just doesn't sound as good as "not quite as loud, but certainly much roomier" by not limiting as much on the master. What I want to give you is the sound that is as close to the "session" as possible, at least on headphones. A hi-res master using Pro Tools gives me that ability.

That being said, I have to hand it to Apple for providing the really great droplet "Mastered for iTunes" AAC codec. I've dropped 24/88.2khz and big 24/44.1 master files onto that droplet and they sound nearly identical, though I think they should have made a version for 320kbps/constant bit rate/normal stereo. But it's still pretty great.
 
Ken Rockwell - LOL! :)

Well known maybe....but that's about it.

Let me ask you just two more questions here.

IF 24-bit audio is technically redundant (as you so claim)...then WHY do we record that way in the studio?

and, why do so many people say that they can hear the difference between 16/44 and 24/96?

KEV

For the same reasons pro equipment has always been of higher quality than typical consumer gear, regardless of the field.

1) Minimize cumulative degradation during the production process. A little dirt here, a little dirt there, and soon you have a real mess. (The cumulative effect of multiple out-of-tolerance conditions has to remain within the desired final quality). Per Wikipedia http://en.wikipedia.org/wiki/Audio_bit_depth:
Most processing operations on digital audio involve requantization of samples, and thus introduce additional rounding error analogous to the original quantization error introduced during analog to digital conversion. To prevent rounding error larger than the implicit error during ADC, calculations during processing must be performed at higher precisions than the input samples.
The fact that such additional precision is needed does not necessarily mean that the difference can be discerned by humans in an A/B comparison of two clean tracks.

Not that it's an issue in this discussion, but it also means studio environments that are sufficiently isolated from noise sources (traffic rumble transmitted through the building structure, velocity noise from the HVAC ducts, humming fluorescent light ballasts, chairs that don't squeak beneath the butts of the string section....). There's no living room as noise-free as a well-built studio, but if we're just laying down a handful of acoustic tracks, close micing in the living room has been good enough for many commercial releases.

2) Marketing. You can't charge pro rates unless you have gear to match. An iPhone 6 on a $30 tripod may be good enough to shoot the job, but don't let the client see it.

I've been witness to 40 years of "How good is good enough" debate over digital audio, video, and still photography. In the beginning it was over the difference between 44.1 KHz and 48 KHz at 16 bits, linear. And, of course, how much "better" black vinyl sounded - I'd cringe over how bad they sounded compared to my digital master, while the "golden ear" beside me waxed rhapsodic over the "warmth" of the disk (let's see... additional compression to prevent cutter head over-excursion, added bass due to rumble, intermod, and phono cartridge hum pickup... yeah, definitely a fatter bottom end). And don't get me started on data destruction (lossy "compression")...

Owning high end equipment does not magically make the owner capable of discerning the difference. I'd wager that, for at least 90% of them, it's "The Emperor's New Clothes." Musicians, engineers and producers who can most certainly detect off-pitch, off-beat performances and instantly know the difference between a fresh set of bronze-wound strings and a set strung a few days previously, have a level of ear training that few others will ever attain. And even they (thanks to inevitable hearing loss, I'm no longer part of the "we") can be fooled into hearing a difference simply because they know the makes/models of the equipment/instruments, or the identity of the performers. There are countless double-blind tests that prove it.

The brain can be trained to discern the slightest variations captured by our eyes and ears, but once the signal falls outside the capabilities of those organs, the brain starts filling in the gaps. And the brain can't always be trusted.

Inevitably, the cost of technology drops, and today's impractical (say, 24-bit/192KHz linear audio streamed over a cellular network and/or maintained in flash storage on a mobile device) will become practical. But will that necessarily be the best use for the bandwidth? John Woram, (speaking at an AES convention back around 1975) cited "Foobini's Law" (not to be confused with Fubini's Law), "Not everything that can be done should be done."
 
This!

For the same reasons pro equipment has always been of higher quality than typical consumer gear, regardless of the field.

1) Minimize cumulative degradation during the production process. A little dirt here, a little dirt there, and soon you have a real mess. (The cumulative effect of multiple out-of-tolerance conditions has to remain within the desired final quality). Per Wikipedia http://en.wikipedia.org/wiki/Audio_bit_depth: The fact that such additional precision is needed does not necessarily mean that the difference can be discerned by humans in an A/B comparison of two clean tracks.

Not that it's an issue in this discussion, but it also means studio environments that are sufficiently isolated from noise sources (traffic rumble transmitted through the building structure, velocity noise from the HVAC ducts, humming fluorescent light ballasts, chairs that don't squeak beneath the butts of the string section....). There's no living room as noise-free as a well-built studio, but if we're just laying down a handful of acoustic tracks, close micing in the living room has been good enough for many commercial releases.

2) Marketing. You can't charge pro rates unless you have gear to match. An iPhone 6 on a $30 tripod may be good enough to shoot the job, but don't let the client see it.

I've been witness to 40 years of "How good is good enough" debate over digital audio, video, and still photography. In the beginning it was over the difference between 44.1 KHz and 48 KHz at 16 bits, linear. And, of course, how much "better" black vinyl sounded - I'd cringe over how bad they sounded compared to my digital master, while the "golden ear" beside me waxed rhapsodic over the "warmth" of the disk (let's see... additional compression to prevent cutter head over-excursion, added bass due to rumble, intermod, and phono cartridge hum pickup... yeah, definitely a fatter bottom end). And don't get me started on data destruction (lossy "compression")...

Owning high end equipment does not magically make the owner capable of discerning the difference. I'd wager that, for at least 90% of them, it's "The Emperor's New Clothes." Musicians, engineers and producers who can most certainly detect off-pitch, off-beat performances and instantly know the difference between a fresh set of bronze-wound strings and a set strung a few days previously, have a level of ear training that few others will ever attain. And even they (thanks to inevitable hearing loss, I'm no longer part of the "we") can be fooled into hearing a difference simply because they know the makes/models of the equipment/instruments, or the identity of the performers. There are countless double-blind tests that prove it.

The brain can be trained to discern the slightest variations captured by our eyes and ears, but once the signal falls outside the capabilities of those organs, the brain starts filling in the gaps. And the brain can't always be trusted.

Inevitably, the cost of technology drops, and today's impractical (say, 24-bit/192KHz linear audio streamed over a cellular network and/or maintained in flash storage on a mobile device) will become practical. But will that necessarily be the best use for the bandwidth? John Woram, (speaking at an AES convention back around 1975) cited "Foobini's Law" (not to be confused with Fubini's Law), "Not everything that can be done should be done."

You said it much better than my essay, and you didn't even have to mention RIAA curves! :) A+
 
Today, for any serious music listener...256k, Mp3 320k, or even CD 16/44 audio is simply not good enough.

24/96 is the standard whereby the digital realm meets the level of analog vinyl.

All the current 'portable' amp/dac units (like Cypher Labs Theoren 720 DAC, ALO International, etc., provide 24/192)...so when is Apple going to follow suit?

Plus, so do all the DAPS (Astell&Kern, Calyx M, Sony, Fiio,) and now the new PONO Player does too.

IF Apple wishes to stay current/competitive in the future...it needs to activate 24/192 audio from it's devices. Especially, as several of the latest (made for iPod/iPhone) external amp/dac units (ie: Cyper Labs, shown below)...provide Hi Res audio quality.

All HD files (FLAC, AIFF, WAV, etc) can be played using the ONKYO HD player app.

The iPod Touch is a really neat/sleek music device...but, now it needs to get with the programme...

KEV

I'm sorry, but you have absolutely no idea what you're talking about.

There has never been a mass market audio format that plays at 24/96. Analog vinyl (as opposed to what other type of vinyl?) is a fantastic format, my personal favorite--and the favorite of most serious music fans--but it can't be compared to 24/96. That's a digital spec, analog isn't digital. Additionally, while vinyl is more enjoyable in a thousand ways--it simply doesn't sound as good as a CD...which also isn't 24/96. (it's 16/44.1)

Vinyl masters have to be prepared differently from their CD or iTunes counterparts. There are limitations to the format, so the engineer must create a master that takes that into account. Excessive dynamics or excessive bass will cause a stylus to jump. That has to be eliminated. Excessive high frequencies (particularly when going from digital recordings to lacquer masters) will sound terrible on vinyl. Unlistenably terrible. These things need to be rectified prior to pressing.

Long story short, digital files don't have those compromises to begin with. It doesn't make them any more enjoyable--but technically, they do sound better--or they're at least more accurate.

Audio files that are properly Mastered for iTunes (I own a MFiT studio) are mastered by the engineer to sound as good as possible within the format. Much like vinyl. Silly little number games that you find on the Internet can't quantify that. Additionally, Apple's AAC compression codec is much better than standard MP3. It's still a lossy format, but it's smarter than MP3. Dynamics work much better with it.

The headphone amps that you're talking about aren't mass market products. They're audiophile or audio professional products. They're not really designed for use with MP3 players--they're designed for use with high end stereos or studio monitoring. If they say they're "Made for iPhone/iPod" it's a marketing ploy to sell a few extra units to people who don't understand what they're buying.

iPod/iPhone are mass market products. The mass market doesn't care about audio quality. It hurts to say that as a pro in the business, but they don't. They never have and never will. They care about convenient access to music they're emotionally connected with.

If you're an audiophile, good for you. I think it's great when people can appreciate the work that goes into making a record--but don't make the mistake of assuming you speak for the world. You don't. It just happens to be what you're in to. Want to know what headphone amp I use at home? None. Why? Because when I want to listen to music I want to enjoy it my way. You can enjoy it your way. Just don't assume that everyone should want or need what you like.
 
  • Like
Reactions: Ping Guo
There seems to be a lot of information in this thread, but also a lot of opinion masking as information.

I can easily demonstrate to anyone the differences in higher sample rates (44.1khz vs. anything higher) that show higher rates have several advantages. Increased precision particularly noticeable in higher frequencies which result in better phase relations, less alias frequencies ( typically harmonics cause by the sampling process ) and more precise peaks.

The bit resolution increase from 16 to 24 not only increases dynamic range from softest to loudest, but also the precision within that range. With 16 bit you have a theoretical maximum range of 96db, with 65,536 different discernible volume levels. With 24 bit that increases to 144db of range (most hardware still can't meet the theoretical limits) and over 16 million volume levels. 8 bit, 16 bit, and 24 bit recordings all exhibit quantization noise at the lowest volume levels, but the higher the bit depth the farther that noise is from the peak volume.

At some point if there is interest I'd be happy to make an example to share online which can show this. I've been teaching digital audio for over 20 years and show these examples to every class I've taught. I've yet to have a single student say they can't hear the difference in increased quality between sample rates or bit resolution.
 
There seems to be a lot of information in this thread, but also a lot of opinion masking as information.

I can easily demonstrate to anyone the differences in higher sample rates (44.1khz vs. anything higher) that show higher rates have several advantages. Increased precision particularly noticeable in higher frequencies which result in better phase relations, less alias frequencies ( typically harmonics cause by the sampling process ) and more precise peaks.

The bit resolution increase from 16 to 24 not only increases dynamic range from softest to loudest, but also the precision within that range. With 16 bit you have a theoretical maximum range of 96db, with 65,536 different discernible volume levels. With 24 bit that increases to 144db of range (most hardware still can't meet the theoretical limits) and over 16 million volume levels. 8 bit, 16 bit, and 24 bit recordings all exhibit quantization noise at the lowest volume levels, but the higher the bit depth the farther that noise is from the peak volume.

At some point if there is interest I'd be happy to make an example to share online which can show this. I've been teaching digital audio for over 20 years and show these examples to every class I've taught. I've yet to have a single student say they can't hear the difference in increased quality between sample rates or bit resolution.
Are you an audio person or a math person? We know from math that sampling at 44.1 kHz reproduces the waveform exactly, phase, peaks, and all, so I cannot agree with your claim. And as for bittedness, there is no recorded music that comes close to challenging 96dB, so 144dB is not necessary either. If you or anyone can hear a difference at all the reason is due to compression, not sampling.

http://xiph.org/~xiphmont/demo/neil-young.html
 
Last edited:
Are you an audio person or a math person? We know from math that sampling at 44.1 kHz reproduces the waveform exactly, phase, peaks, and all, so I cannot agree with your claim. And as for bittedness, there is no recorded music that comes close to challenging 96dB, so 144dB is not necessary either. If you or anyone can hear a difference at all the reason is due to compression, not sampling.

http://xiph.org/~xiphmont/demo/neil-young.html

I'm an audio person (if that's a thing). Been recording/working/teaching in studios for 40 years. Give me some time this evening and I'll post you a drawing that can show the simple differences in amplitude, phase, and aliasing.
 
I'm an audio person (if that's a thing). Been recording/working/teaching in studios for 40 years. Give me some time this evening and I'll post you a drawing that can show the simple differences in amplitude, phase, and aliasing.

I've been an electrical engineer for 32 years. I studied the math in school a long time ago but I haven't delved deeply into this for a long time. I sincerely would like to see your drawings.
 
I'm an audio person (if that's a thing). Been recording/working/teaching in studios for 40 years. Give me some time this evening and I'll post you a drawing that can show the simple differences in amplitude, phase, and aliasing.

I've been an electrical engineer for 32 years. I studied the math in school a long time ago but I haven't delved deeply into this for a long time. I sincerely would like to see your drawings.

Lets say this waveform is at 20khz (the thin sine wave in the background) and the sampling rate is 40khz (the points and darker lines). It makes the math easy with simple round numbers but the results are the same. This is the least you can sample and capture the motion of the waveform (the Nyquist theory/limit). As you can see in the top example, if the 2 samples per cycle are at precisely at the 90º and 180º points you capture the amplitude of the waveform. However if the samples aren't at the precise peak/node and trough/antinode like the second example then you might get the 45º and 235º samples. How often the waveform repeats in the second example is the same frequency, but its overall amplitude is reduced, and its phase is shifted by 45º. In the third example where samples are taken at the 0º and 180º points (still twice per cycle or iteration of the waveform) you can end up with total silence.

Using 44.1khz sampling on a 20khz waveform will give you slightly more than 2 samples per cycle (2.205 samples per cycle), which means the sample points will constantly be shifting by several degrees through each cycle (163.xx degrees between each sample). So the waveform will constantly be shifting in and out of phase as well as varying in amplitude.

The D/A converters can also take this info and try to smooth it with oversampling and computation of waveform trajectory, but if the level between samples ramps, like in the picture, then you will get a waveform that produces some harmonics (a sine wave has none, it is a single frequency). Granted for a 20khz waveform that first harmonic is above standard human hearing, but lower frequencies will produce harmonics that are audible. If the D/A converters produce a constant level between each sample, rather than ramping, you'll get something that looks more like a square wave.

That's all I've got time for now, but hopefully this makes some sense. I like a good discussion and if I've got my math or other observations wrong I'm happy to learn new tricks.
 

Attachments

  • sampling.png
    sampling.png
    771.1 KB · Views: 175
Lets say this waveform is at 20khz (the thin sine wave in the background) and the sampling rate is 40khz (the points and darker lines). It makes the math easy with simple round numbers but the results are the same. This is the least you can sample and capture the motion of the waveform (the Nyquist theory/limit). As you can see in the top example, if the 2 samples per cycle are at precisely at the 90º and 180º points you capture the amplitude of the waveform. However if the samples aren't at the precise peak/node and trough/antinode like the second example then you might get the 45º and 235º samples. How often the waveform repeats in the second example is the same frequency, but its overall amplitude is reduced, and its phase is shifted by 45º. In the third example where samples are taken at the 0º and 180º points (still twice per cycle or iteration of the waveform) you can end up with total silence.

Freqsound, thanks for the picture. There is a problem with your observation, and you touched on it. Your sampling rate is too slow. The rate cannot be twice the frequency of the signal, it must be higher that two times the frequency of the highest signal frequency. thats critically important, and example 3 shows why. There are two waveforms that result in the samples you depicted--a 0Hz sine wave, and a 20kHz sine wave. That's possible when you don't sample fast enough. But sample at greater than 40kHz and there can only be one solution.

Using 44.1khz sampling on a 20khz waveform will give you slightly more than 2 samples per cycle (2.205 samples per cycle), which means the sample points will constantly be shifting by several degrees through each cycle (163.xx degrees between each sample). So the waveform will constantly be shifting in and out of phase as well as varying in amplitude.

The waveform drawn in the examples is not the waveform that results from the D/A conversion at all. It's not straight lines between points. When samples are taken at more than twice the signal frequency, there is one and only one smoothly varying waveform that hits all the points--the original waveform, and that is exactly reproduced by the D/A process. It has the same frequency, phase, and amplitude at all points, even between the sample points!

The D/A converters can also take this info and try to smooth it with oversampling and computation of waveform trajectory, but if the level between samples ramps, like in the picture, then you will get a waveform that produces some harmonics (a sine wave has none, it is a single frequency). Granted for a 20khz waveform that first harmonic is above standard human hearing, but lower frequencies will produce harmonics that are audible. If the D/A converters produce a constant level between each sample, rather than ramping, you'll get something that looks more like a square wave

D/A conversion doesn't ramp between points and it doesn't produce constant levels between points. It generates a smoothly varying analog waveform that passes through all of the sampled points, and the only waveform that can hit those points is the original analog waveform. I found a fantastic video a few weeks ago that illustrates it perfectly. Let me try to find it again. For now, please accept that you have a few misconceptions.

Edit: I found it! It was linked in an excellent article on xiph.org about 24/96 digital audio.
Here is the article: http://people.xiph.org/~xiphmont/demo/neil-young.html
Here is the linked video (plays in Chrome or Firefox): http://xiph.org/video/vid2.shtml
 
Last edited:
If I blindfold you and do listening tests ,you will discern a big difference between a 16 bit and a 24 bit file,

However you will not be able to tell a difference between the 48k vs the 96k sample rates .

the bits are what's needed.


Best,
SvK
 
Miles Davis never would have allowed this.

If I blindfold you and do listening tests ,you will discern a big difference between a 16 bit and a 24 bit file,

However you will not be able to tell a difference between the 48k vs the 96k sample rates .

the bits are what's needed.


Best,
SvK

If you're just listening to no-upper-harmonic-content music (most of the drum machines being reused to death today were sampled at 32khz and 12-bits.) Some of the really old stuff is 12khz and 8-bits, so you could easily get away with tracking line-in at 16/44.1 and a SM57 for the vocalist. -Maybe even record at 320mbps mp3; nobody would know or care.

But not hearing the difference in the sample rates (and some still argue bit depths) when recording a real band or a concert grand or chamber music or a real singer with great mics and preamps? PEBCAM.

But it doesn't matter because most listeners have never sat in the chair in an editing suite. And now we all have to mix for the earbud crowd.

:)
 
Let's use the old photography comparison analogy again here.
Are YOU seriously saying that a DX 12-bit/12Mpix image....has no more micro-contrast, line-edge definition (ie: clarity), detail, and presence....than an FX 16-bit/36MPix image does?

Well, as a 35+ year studio macro stock photog...I can tell you it sure as hell does.

It's the difference between a regular DSLR camera and a professional medium format system.

Here's a few of my studio macro images as a reference point.


http://kvincentphotography.ca/macro

http://kvincentphotography.ca/stackedimages

http://kvincentphotography.ca/designerflorals


KEV

Sorry, but if your best argument is a false equivalence, you're already in trouble.

FX/DX sensors have real, physical differences that can be measured and are visible to the human eye.
1. A larger sensor with same number of pixels is more light sensitive (larger surface area to receive light).
2. A larger sensor with same size pixels can have more of them allowing for more detail and digital zoom Note there is a limit to the glass, and you could run into a limit here also. a 100MP sensor on your iPhone would be wasted, for example.
3. Depth of Focus - larger sensors have shallowed depth of field. If you shoot stock photos, you know what I mean.

NONE of these arguments apply to the one you are trying to make.

You MIGHT try to make a case for 14bit versus 12 bit RAW. The average human eye can distinguish about 4 million non-adjacent colors, but much more than that in adjacent colors. Depending on a condition known as tetrachromacy (where the rods contribute to color perception, an uncommon to rare condition, depending on who you believe) this is between 10 and 100 million colors. At 10 million, 12 bit RAW has you covered, but people who can see more will see banding.

Now, on to sound.
http://www.soundonsound.com/sos/sep07/articles/digitalmyths.htm
Hight sample rate are not required. A CD does plenty. More accurate clocks will improve things, but those are and have been coming over the life of digital music.

Of course, they won't change the limits of the original recording. Magnetic tape getting sticky after a few decades, wearing unevenly, limitations of the microphones, and so on. For sure, that song Neil Young listed to on AM radio on his one speaker sound system did not provide more fidelity than a modern CD.

Also, my iTunes sounds just fine with the engine pushing 7000 RPM and sunroof open. :)
 
That's excellent, thank you!

I've been arguing with quite a few people in different situations about this subject. What I found most of the time was a strong faith in an imprecise interpretation of the Nyquist-Shannon Sampling Theorem. Let's review its Wikipedia definition: "If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.".

Now, how would we know that an arbitrary analog audio waveform could be described by a function that contains no frequencies higher than 20 KHz? The faithful hold this as a self-evident truth, based on the well-established fact that average human ear can't register a harmonic waveform with frequency over 20 KHz. The first fallacy lies right there!

Let's consider a simple case: a sine wave of frequency Fc between 20 and 20,000 Hz modulated by another sine wave of frequency Fm. The spectrum of this signal will contain two sideband frequencies: Fc-Fm and Fc+Fm. Now imagine that either Fc-Fm, or Fc+Fm, or both, fall outside the 20-20,000 Hz range, are filtered out, and are not present in the digitized representation of the waveform. Instead of the original amplitude-modulated signal, a listener will hear just Fc, or Fc plus parasitic signals at other frequencies, perceived as distortions.

Let's consider another simple case: a sine wave starts abruptly, is present for a while, and then abruptly ends. The spectrum of such signal is infinite, definitely not falling into the customary 20-20,000 Hz range. Now, signals generated by biological organisms and by physical music instruments do not start or end abruptly: they tend to have finite attack and decay times. Yet, their spectrum can extend well beyond the customary range. Cutting their spectrum results in distortions. If there are enough distortions, human hearing processing system perceives the resulting audio signal as unpleasant, dirty, muddy, muffled etc.

If we delve into the actual math of the Nyquist-Shannon Sampling Theorem, we'll find couple other peculiarities that the faithful are not taking into account. First, there is assumption that the input signal is digitized precisely at precisely uniform intervals. Second, that the DAC uses summation of sinc functions or convolution with Dirac delta pulses to restore the waveform perfectly. In practical applications, both assumptions are inaccurate, invalidating the Theorem. Imperfections of the DAC hardware practically sellable at mass market prices result in distortions.

A simple way to reduce distortions is to increase both sampling rate and bit depth. Let's assume the accuracy of digitization in amplitude/time domain expressed as a percentage of their quanta is the same for a 24/96 and a 16/44 system. Then for the higher frequency and deeper bits we'll have lower distortions relative to the customary frequency range and dynamic range. But it doesn't necessarily hold true for every pair of 24/96 and 16/44 systems. Higher-frequency deeper-bits ADC/DAC could have such high distortions relative to their quanta that it translates into higher distortions relative to the customary ranges as well, compared to a higher quality system with lower frequency and dynamic range.

In order for the improvements to be heard, one needs a sound recording and reproduction system with low level of distortions, wide bandwidth, and wide dynamic range. This results in a strong "presence effect", described by listeners as "clarity", "transparency", "nuance", "accuracy", "precise stereo image" etc. Just like for Kev, this is not an article of faith for me at all, given my background in experimental Physics. A decent large-diaphragm capacitor microphone, accurate professional mixer/ADC/DAC, pair of high-end studio monitors, and a good Digital Audio Workstation would allow you to conduct experiments proving beyond reasonable doubt that a high-quality 24/96 does produce a noticeably stronger presence effect than 16/44.


Lets say this waveform is at 20khz (the thin sine wave in the background) and the sampling rate is 40khz (the points and darker lines). It makes the math easy with simple round numbers but the results are the same. This is the least you can sample and capture the motion of the waveform (the Nyquist theory/limit). As you can see in the top example, if the 2 samples per cycle are at precisely at the 90º and 180º points you capture the amplitude of the waveform. However if the samples aren't at the precise peak/node and trough/antinode like the second example then you might get the 45º and 235º samples. How often the waveform repeats in the second example is the same frequency, but its overall amplitude is reduced, and its phase is shifted by 45º. In the third example where samples are taken at the 0º and 180º points (still twice per cycle or iteration of the waveform) you can end up with total silence.

Using 44.1khz sampling on a 20khz waveform will give you slightly more than 2 samples per cycle (2.205 samples per cycle), which means the sample points will constantly be shifting by several degrees through each cycle (163.xx degrees between each sample). So the waveform will constantly be shifting in and out of phase as well as varying in amplitude.

The D/A converters can also take this info and try to smooth it with oversampling and computation of waveform trajectory, but if the level between samples ramps, like in the picture, then you will get a waveform that produces some harmonics (a sine wave has none, it is a single frequency). Granted for a 20khz waveform that first harmonic is above standard human hearing, but lower frequencies will produce harmonics that are audible. If the D/A converters produce a constant level between each sample, rather than ramping, you'll get something that looks more like a square wave.

That's all I've got time for now, but hopefully this makes some sense. I like a good discussion and if I've got my math or other observations wrong I'm happy to learn new tricks.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.