http://www.rjamorim.com/test/aac128test/results.html
in blind listening tests, there are people who can tell the difference.
Seen that many times before. Several problems...
1. It is not a randomized, double-blind test. The subject pool is not picked at random. The subject pool consists ostensibly of people who tend to go to the site which would include, largely, people who have an interest in vindicating their audio snobbery with these sorts of listening tests. Even in a blind test, this lack of randomization will skew the results one way or another. This is not scientific.
2. It is not a true double-blind test. The examiner knows which samples are which, and can unconsciously have arranged them in an order that skews the results. There's no documentation as to how the distribution or order of samples is done, whether it is also randomized to rule out order bias. There is no discussion of the full test parameters, really... just that it is an ABX test.
ABX tests, I have found, do not actually test whether a listener can discern the difference between one format and another. What they test is what samples the listener thinks are most alike the reference sample. But without a reference sample can anyone tell the difference? How about testing each song only once? And does knowing there's a reference sample at all skew the results? No control and variable trials are done with different random sample populations to determine which if any of the parameters may be biasing the results so as to rule out the degree of bias and/or placebo effect.
Also, what are the effects of the test result when labels are added, or when each file is mislabeled? What is the percentage of error that occurs when an individual is given the same test repeatedly... How often do they actually identify the same sample correctly? How often do they identify the same sample as being different formats?
There is a difference between subjective and objective perception. This test does nothing to compare and contrast the two. It only accounts for subjective perception by asking the subject to identify which two samples sound most alike... but that is not the same as asking someone whether or not they can actually discern what differences, if any, exist between one format and another. But most importantly, there has to be a reason for those differences to exist. If the degree to which a user identifies the correct sample in a series of samples is actually no greater than the degree to which they misidentify the same sample several times, then something in that individual's perception, rather than the object being tested, is flawed.
There are many phases to randomized, double-blind trials and these ABX tests are as insufficient as so-called online IQ tests at determining actual degrees of perceptual transparency between audio formats. Furthermore, the environment is not controlled as it would be in a scientific setting because the playback equipment will vary from subject to subject. This is utterly unscientific.
Another thing that scientific experimentation and testing does is it makes certain predictions on what we should expect to find. If no predictions are made, then the results of perceptual testimony are meaningless. I'll explain:
If there really is a reason for users to perceive AAC and PCM playback as different, then there should be a fundamental and measurable difference in the analogue waveforms that are reconstructed by the decoding algorithms of the two formats. If the difference is statistically insignificant relative to the thresholds of all human perception, then any perceived differences are either due to outside factors or are simply imagined by the subject.
It is exceedingly suspicious that these types of tests never predict the precise differences we should expect to find, and consequently should expect subjects to identify symptomatically on some level with some degree of concordance in the test results. And I'm frankly tired of people claiming that "complex audio isn't predictable" or that "perception comes down to the individual" ...due to the fact that both PCM and AAC are built around very solid mathematical algorithms whose resulting artifactions should be as predictable as the 12.1kHz alias frequency that arises from sampling a 32kHz sinewave at 44.1kHz without a low-pass filter at the Nyquist limit.
Again, that digital encoding systems can reproduce very high-fidelity audio, much less audio at all instead of gibberish, is a testament to the predictive abilities of science. Digital encoding/decoding systems weren't designed by throwing circuits together until the engineer heard something.
So much is understood about digital encoding/decoding by the engineers who design such systems that mathematically one can predict where and when artifaction should be present and what is required to overcome it. To wit... the fundamental basis for the minimum critical sampling frequency used in digital audio systems today was derived from predictions made by the Shannon-Nyquist Sampling Theorem in 1928.
As for differences between people's perception... well, one of the things a random sample population would do is it would give us an average population, rather than a population with perceptual abilities that are outside the normal distribution one way or another. In fact, several sample populations could consist of average perception, acute perception, and poor perception, to weigh the differences among the groups... but of course none of that is done in these ABX tests.)
There's no reason to take these test results seriously because not enough measures have been taken to ensure that the conditions of the tests are uniform, controlled and truly unbiased.