THE TRUTH ABOUT CD AND DIGITAL
THE TRAGEDY OF THE MISSING INFORMATION
©1984 Mark B. Anstendig
Without exception, the public should avoid all products using digital sound processing, including digitally re-mastered records as well as CD discs. The digital process currently used for recordings and CD discs was introduced long before the requisite technology was perfected. The current technology is incapable of preserving all of the information necessary for the accurate reproduction of a musical performance or any other real-life sound event. The adoption of this flawed technology as the accepted standard for the whole industry makes it impossible to perfect commercially available digital recordings without changing all of the hardware, i.e., without making all digital equipment and CD discs obsolete.
The differences between digital and analog recorded sound are clear, obvious, and not the least bit subtle. If an experienced listener with good hearing cannot hear these differences when comparing the same recording in digital and analog versions, the sound-system must be faulty. Unfortunately, that is almost universally the case, as most sound-systems are not capable of resolving enough detail to reproduce the musical experience contained in the original performance. In fact, the impression that digital is an improvement over analog recordings stems from the fact that most playback systems, particularly the record-playing components (pickup-cartridge, tone-arm and turntable), are not capable of resolving all the detail in the record grooves. Most listeners have, therefore, not yet heard all of the information on their analog records. The vast majority of owners of analog sound systems should, therefore, upgrade their sound systems so they can hear how well their records can sound, rather than invest in a permanently flawed digital system.
For the following explanation of the problems of digital recording, we are indebted to Mitchell A. Cotter, one of the most respected electronic, acoustic, and audiological authorities in the country.
Traditional, or "analog", sound reproduction saves the sound in a form similar to (analogous to) the original sounds. Digital recording, on the other hand, uses an analog-to-digital converter to convert sound into a train of numbers by sampling the sounds at a fixed number of times per second. The speed of this sampling rate determines the detail and subtleties that the digital train will represent. It determines the high-frequency limit of the reproduction and the amount of dynamic subtleties that are captured.
The sampling rate of 44,000 samples per second, currently used by the industry, is not frequent enough to capture accurately all the subtlety of the higher pitched musical transients. A much quoted theory, the Nyquist Theory, states that, in order to reproduce a simple, steady, unvarying sound, one must have a sampling rate that is at least twice the frequency of the sound. In other words, a 20,000 hertz tone, which is considered the upper limit of hearing, needs a sampling rate of at least 40,000 samples per second.
Think of each single vibration of a 20,000 Hz tone. Each vibration begins at zero, goes up to a peak above zero, down to a trough below zero, and returns to zero. That occurs 20,000 times per second. A sampling rate of double the frequency, in this case 40,000 samples per second, is therefore the slowest sampling rate that could theoretically sample both a peak and a trough of each vibration. But that would be true only if the sampler were lucky enough to be absolutely synchronized with the peaks and troughs of the sound being sampled. If they are not absolutely in synch, the sampler might be sampling each vibration at zero, which would result in data indicating no sound, or anywhere on the way up or down. Since it is impossible to precisely synchronize the sampler with the peaks and troughs of musical or other non-mechanically produced vibrations, it is misleading to speak about the Nyquist limit in relation to sound-reproduction.
(It should also be pointed out that, while a sampling rate of double the frequency is the theoretical minimum for reproducing a steady, unvarying tone, no circuits are 100% efficient. In fact, the efficiency of current machines is quite low. But, even with an ideal, completely efficient machine, the ambiguity of the data with a 20,000 Hz tone and a sampling rate of 40,000 samples per second would be 100%, because each frequency can only be sampled twice per cycle.)
If the sampling rate is not exactly double the vibration, as is the case with most frequencies, the tones will be sampled at constantly changing positions in the cycle of each frequency. Think of a disc with a white spot on it that is turning clockwise at 100 times per minute and is lit by a strobe flashing exactly at 100 times per minute. The white spot will stand still. If you speed up the flash a little, the spot will appear to be slowly moving counterclockwise. In digital, such an effect happens at all frequencies unless the sampling rate is dense enough to catch the whole waveform of all sound vibrations.
Furthermore, because the sampling rate is too slow, it generates other distortions as well as its own additional sounds that degrade the final result in all frequencies. Because these degradations of the sound belong only to digital, they are completely new and are, therefore, not at all reflected in the analog-type specifications used to tout digital as an extremely accurate form of recording. Those specifications, which describe analog problems, do not even apply to digital. For example, no digital could have any wow or flutter. If it did, it would not be a little better or a little worse than other machines, it would simply be defective.
The patterns of musical sounds are not steady, unvarying tones. Most sound patterns consist of highly complex mixtures of tones of constantly varying dynamics and subtly varying pulse. The rise and fall of the tones as they progress in time (modulation) and their tonal qualities must also be accurately reproduced and differentiated. The sampling rate of 44,000 samples per second is not even close to being fast enough to reproduce that information accurately. Mr. Cotter emphasizes that this limitation is not a matter of imperfection of circuitry but, rather, a fundamental mathematical limitation from the lack of sampling density. It is, therefore, simple ignorance to be speaking-of the Nyquist theory in relation to the recording of sound information. Anything approaching a true resemblance of the original sound-patterns first begins at least 10 times the Nyquist limit, or twenty times the frequency of the tone being reproduced. And much greater sampling density than that is necessary for true accuracy that can match the fidelity of analog recordings. To improve upon the best of today's analog sound a sampling rate approaching one million samples per second (one megahertz) would be necessary.
Mr. Cotter explains that the reason for so much misunderstanding and misinformation regarding digital is that most of the important research and development of digital technology was done under government contract for defense purposes, particularly in the development of radar technologies to disguise the presence of radar signals and in further technologies aimed at detecting those disguised signals. Therefore, the most important research into the necessary sampling density for detection of transients, etc., belongs to different disciplines, much of which lies under the blanket of National Security and is not readily accessible, even to the professional audio world.
The Anstendig Institute has used digital processors for half a year and has carefully investigated the sound quality, including comparisons of audience reaction to programs of digital and programs of analog sound. In addition, Mr. Anstendig has been able to evaluate the digital sound and compare it to analog sound in a listening facility, designed and executed by Mr. Cotter, that is considered by many to be acoustically and electronically perfect and the finest installation of its kind.
Everything in this room, including the walls, ceilings, speakers, electronics, and even the resonance factors of the building materials has been precisely computed to be an integral part of the sound reproduction. It is a perfect acoustical environment. Live, but with absolutely no ringing and no resonances. The speakers, custom-built to Cotter's specifications, radiate in such a manner that the sound remains the same throughout the room.
State-of-the-art recordings that were simultaneously recorded in analog (direct-to-disc) and in digital versions were compared, with and without equalization (for a true comparison, it is important to be able to change the frequency balance of each recording so that they match each other, because the frequency balances of the analog and CD versions are not the same. In this particular comparison, the CD had louder highs).1 The difference between analog and digital recordings is clearly audible on The Anstendig Institute's own sound system. But it is definitive to hear the differences in such a perfect room.
In The Anstendig Institute's experience, the various faults of digital limit the fine detailing of the sound and the subtle nuances of dynamics that make up the expressive content. The resultant sound reproduction is, therefore, quite different from, and inferior in expression to the original performance. This is the worst possible flaw, because the most important aspect of music, the expressive content, is changed and degraded.
With professional analog recordings, the necessary information does get preserved on the tape or disc. The frequencies may be out of balance (unequalized) or, as in early recordings, limited in frequency range, but al1 of the dynamic qualities are saved and can be retrieved.2 (Supposedly, the missing frequencies can now even be electronically reconstructed.) But with digital, that is not the case. Definitely from 1000 Hz on up (and, in our experience, much lower), enough information to reproduce the dynamic time-factors of the music (the precise dynamic fluctuations of the sounds as they progress in time) simply is not on the tape and the modulation of the rest of the frequency range is imperfect. Since it was not saved in the master recording, this information can never be retrieved. In actual listening, the most easily bearable problem is that the sound of the frequencies above 1000 Hz is coarser and grainier. But, more important, the bloom of the tones is gone and the expression is compromised and changed. The rise and decay of the tones are lost. The sounds are expressionless and the music sounds curiously dead, dramatically so if one familiar with the original. The differences immediately show up in a comparison of simultaneously recorded analog and digital versions of the same performance using first- rate equipment.
So far, the knowledgeable of the audio world have realized that the low sampling rate limits digital's ability to capture dynamic transients (the very short, sharp, isolated sounds). But it is more important to understand that the ability to capture all dynamic nuance is limited.
The Anstendig Institute has, with invited guests, compared four types of recorders: the finest cassette-recorder on the market, a reel-to-reel tape machine at 15 IPS, a digital processor, and a top-of-the-line Beta Hi-Fi video recorder. All four machines were set up together and simultaneously recorded the same record. The results: the only machines capable of saving enough information to claim to be reproducing the source are the reel-to-reel (at the higher speeds) and the Beta Hi- Fi. The others simply are not reproducing the music, with digital producing the worse results.
Cassette recordings are not much better than digital. The sound is not quite as bad, but the subtleties are not all present in the lower registers and the highs are coarse, without bloom or luster. During our test, no one was moved by the music when the cassette recording was played. The music is almost as curiously dead as with the digital process.
The lack of the important human expressive qualities in digital recordings goes unnoticed for two reasons. First of all, since one does, of course, hear some expression, albeit a falsification, there is no way the listener can know that the expression is wrong and that something important is missing. Secondly, most people (and most musicians too) are no longer used to listening for subtlety because it is not present in most sound systems or in many of the other sounds they are accustomed to hearing in the modern world. Because, for nearly a century, playback systems have not been able to reproduce the most subtle expressive nuances and exquisite modulation of music, many people hear those qualities so seldom that they no longer know about or listen for them. But the Anstendig Institute has found that most people do respond to subtlety when exposed to it under the right conditions and their attention is directed towards it. It is, therefore, important that these essential civilized qualities are not lost to society.
Unfortunately, the prevalence of bad radios and cheap sound-systems that do not reproduce what is on the records as well as recordings that only approximate the actual recorded performance has given rise to the idea that with such approximations one can still experience the classical masterworks. But the truth is that either the sound is an accurate reproduction or it is not. If it is not, the listener is not hearing that music. What is heard is a distortion that is really something else quite different from the original art-work that was recorded.3 With digital, there is more than simple distortions of the signal. Part of the signal is simply missing and the rest is adversely affected and truncated. Indicatively, as already mentioned, the many distortions causing these problems cannot be reflected in the type of measurements (specs) used to advertise digital recordings.
Unfortunately, the public has come to rely on specs in its purchase of equipment. It is, therefore, important to understand that digital has brought a whole new set of distortions and that measuring techniques for expressing these distortions as universally meaningful specifications have either not yet been developed or not yet adopted by the industry. The same specifications used to describe analog recordings are used in describing digital even though they apply only to analog and not at all to digital. To quote Cotter: "They are not only talking about apples and oranges, they are talking about totally irrelevant things. They tell you about all of the analog frailties that the digital system does not have, as though the digital systems could ever have them (they can't possibly). And they tell you nothing about the digital system's frailties, which the analog system has none of. In balance, the frailties of the analog system are more musical, even if it has frequency irregularities and .5% distortions, etc. That is like life; we live with those and similar distortions all the time. But we don't live with those digital sampling processes and their unnatural distortions. The description of the perfection of these digital audio systems using total harmonic distortion and all that kind of thing is the sheerest irrelevancy.”
Even with analog recordings, the buyer is not offered specifications that describe anything more than limited aspects of how that machine would reproduce steady, unwavering, and, therefore, unnatural, mechanical sounds. The currently popular specs tell the reader nothing about how even an analog machine will reproduce real-life, live sounds, in particular the expressive subtleties.
Also, much touted measurements, such as those of digital's dynamic range, are inaccurate when comparing the limitations of the dynamic range of analog with the limitations of dynamic range in digital. For example, the analog recording actually can preserve a much wider dynamic range of information than its specs indicate, while digital, in reality, provides far less dynamic range than the specs claim. And the specified upper and lower dynamic range of digital cannot be fully utilized because, as the signal approaches or exceeds those limits, an unavoidable reaction takes place which causes the machine to create new, ugly, unlistenable sounds which are added to the signal. In all digital, it is, therefore, necessary to keep the signal quite a bit away from the dynamic limits defined by the specs. A stated dynamic range of 80 decibels (dB) might, therefore, only amount to about 60 dB or less of usable dynamic range. But, with analog recording, a great deal of usable information can still be recorded for quite a ways above and below the dynamic range implied by the specs.
The big difference between analog and current digital is that when no sound is being played, the digital is always silent, while the analog recording will often have some low-level background noise. But the ear easily adapts itself to and ignores steady unvarying low-level noise, while the distortions of digital are more disrupting to the ear (if less overtly noticeable) because they are constantly fluctuating along with the audio signal.
The present, extraordinarily ugly situation is fraught with irony because, as a technology, Digital recording is not limited. With an adequate sampling rate and adequate dynamic range, it would be the ideal recording medium. But the sampling rate would have to be quite a bit higher than 200,000 just to approach decent hi-fi sound and it should be about 1 Megahertz if it is to be an improvement over what is possible with analog recording. Also, the dynamic range would have to be improved by raising the number of bits of the processor, independently of the processes involved in reproducing the time relationships.
At present, the public has no examples of what the reproduction of music should really sound like available to it. The problems of distinguishing the differences between various kinds of recorded sound without adequate equipment make it particularly urgent that a room with a perfect acoustic and sound-system be available to the public and particularly to the professionals who need such a frame of reference. In fact, many such facilities should exist around the country, if not the world. A direct result of the lack of availability of such a reference standard is the current confusion in the field of acoustics and the well-known, major crisis in the music world due to the deterioration in the quality of music-making, which is, to a great degree, the result of decades of hearing deficient sound-reproduction.
People no longer know what music should sound like. They are hearing substantially less than the content of the great recorded performances. They are being robbed of a crucial part of their cultural heritage in the great recordings of this century, many of them by the composers themselves. In fact, the confusion at present is so total, and wrong, inaccurate sound-reproduction is so universally prevalent, that it is almost too late. That the gross flaws of digital are not immediately apparent, even to professionals, is a major symptom of this general malaise.
Musicians as well as the public have been listening to imperfect cassette-players and sound systems for so long that the poor sound quality of digital is not readily noticed. The tragedy is that some day the world will wake up and realize that all these new digital recordings might just as well be thrown away because they do not contain the most important information and that missing information is irretrievably lost. Along with cassettes and most sound-systems, they are a misrepresentation of art, a subtle, insidious disease that has long been eroding cultural life. The only thing worse--unthinkable even, but possibly more likely--would be for the world to accept today's digital sound and not wake up.
1 The need for equalization is explained in our paper "Sound Equalization in Relation to the Way We Hear”.
2 Technically, analog tape-recording also has a sampling rate. The bias frequency is really a sampling rate. But the bias frequency is much higher than the current sampling rate of digital. According to Cotter, analog tape recording can adequately resolve the frequency spectrum up to about 10,000 Hz, after which the signal quality begins deteriorating.
3 This problem is paralleled in the visual world by the poor quality of photo-reproductions of art works and, especially by the widespread use of inadequate 35mm slide- reproductions to judge visual art, by the National Endowment for the Arts and other philanthropic organizations. Due to failings in the photo-reproduction process itself, particularly the inability of any available focusing device to achieve precise focus, these slides are technically incapable of accurate reproduction of art works (or of anything else). What therefore is being judged are falsifications that are completely different art works.
The Anstendig Institute is a non-profit, tax-exempt, research institute that was founded to investigate stress-producing vibrational influences in our lives and to pursue research in the fields of sight and sound; to provide material designed to help the public become aware of and understand stressful vibrational influences; to instruct the public in how to improve the quality of those influences in their lives; and to provide the research and explanations that are necessary for an understanding of how we see and hear.