STEREO: A MISUNDERSTANDING

THE THEORY, SOUND-SYSTEMS, AND PROBLEMS OF HEARING

Revised 1984

There is a common misconception that the addition of stereophonic sound-reproduction was the necessary, correct step in perfecting monophonic recording. It is believed that, because we hear with two ears, sound should be recorded with two microphones if it is to sound natural. It is also believed that stereophony exists as a natural, scientific phenomenon. Neither belief is correct. The attempt to reproduce the way sound is heard by means of stereophonic sound reproduction is a misunderstanding that is the result of a fault in logic. Since recording is a duplication of sounds, only the sounds can be duplicated, not the manner in which they are heard. The introduction of stereophony and its universal acceptance has had the unfortunate effect of slowing progress in the improvement of recorded sound quality and keeping the general level of musical experience substantially below that which is truly possible, both through recordings and in live performance.

Hearing is classically accepted as the most important of the senses. Of all five senses, the effects of hearing are the most powerful. It is humanity's chief means of becoming familiar with and communicating emotions. Today, recordings are the means through which the greater part of society is introduced to the vast scale of human experiences that can be had through sound. It is important, therefore, that society take a careful look at the universal use of stereophony in sound reproduction.

I. THE FLAWED LOGIC

The word "stereo" is currently used as a blanket designation for all sound reproduction. This is a misrepresentation. Stereo is only a means of achieving an effect of directionality. In fact, it is only one of many ways directionality can be sonically produced, and a very limited one at that. Stereo is limited to producing only a frontal, horizontal plane, with no means of reproducing sounds that come from above, below or behind, nor can it accurately reproduce depth. (Impressions of depth are a combination of the arbitrary disposition of the loudspeakers and the listener in relation to the listening room, which is different in each situation. It is a form of auditory illusion, not an accurate duplication of the depth of the recorded event.)

Most people have the impression that the stereo signal is a complete entity that is made up of two incomplete halves of a complete signal, each of which essentially contains only half of the sounds. That is not true. When two microphones are used, each channel is a single, complete monophonic signal documenting every bit of the particular sound event, but each from a slightly different position (in theory, only about as far apart as our two ears, i,e., the width of a human head).

It is important to understand that there are no stereo sound sources. From any given position in space, all sound sources are monophonic.¹ In live sound as well as sound reproduction, the effects that produce the impression of dimension and direction take place within the listener and not in the sound source(s). The stereophonic signal does not, in itself, include the spatial, stereophonic effect. It only includes two mono signals, which would produce no effect of spatial dimension if they were played by themselves, played through two separate speakers standing next to each other, or electrically combined and fed through one speaker (played back monophonically). The spatial effect only occurs through separation of the two signals in space in relation to the listener, and that effect changes in relation to any change in position of the two speakers and the listener. Live sounds may occur at various distances and in various directions in relation to the hearer, but each one is always a separate monophonic sound whether the source is stationary or moving. Sounds are only directional in relation to the hearer. They are given directionality during the act of hearing, which occurs after the sounds are produced or reproduced and therefore has nothing to do with the manner in which the sounds are produced.

Stereo is based on the premise that, because sound is heard with two ears, the correct way to reproduce sound is to simulate the way it is heard, i.e., by recording two separate signals, using two microphones separated by a distance equivalent to the width of a human head. That is a misunderstanding of the realities. Stereophony as thus defined is an attempt to reproduce the way sound is heard. This is illogical and impossible. Human hearing could never be duplicated in the recording process because hearing consists of more than just two ears. The shape of the ears plays a role in distinguishing the direction of sounds, and the rest of the body also plays a role in the hearing and experiencing of sound. None of these aspects of hearing can be duplicated by microphones.

The hearing experience takes place only in the hearer and only after the sound has been reproduced by the sound system. This phenomenon is incidental to and completely separate from both the production of the sounds and the characteristics of the sounds. What comes out of the speaker is a duplication (more accurately, an approximation) of the sound as it was produced by the source and colored by the space in which it was produced. It is not, nor can it ever be, a duplication of the hearing process. In fact, the shape of the sound source and the materials of which it is made determine the characteristics of a sound. Any recording, whether in stereo, quad, or any other mode, can only duplicate the sound as produced by the sound source, not as heard by a listener. The characteristics of the sound source and of the sound itself are what must determine the technical means used to record it. How a sound happens to be heard is completely incidental to and has no bearing on the production or accurate reproduction of that sound.

Stereophony should play no role in considerations regarding sound quality in the construction or evaluation of components, even those meant for stereo reproduction. All auditioning and evaluating of the accuracy of components, especially loudspeakers, whether by the manufacturer or the buyer, should be done with a mono signal, with no attempt to reproduce spatial effects. All aspects of the rest of the sound system, except the depiction of space, should also be auditioned monophonically. The only function of the electrical components of a sound-reproduction system and the loudspeakers is to produce an acoustic signal that as closely as possible resembles the electrical signal fed into it by the source. Nothing more! In fact, it is impossible for a sound system to do anything more than that. The signal itself does not, and cannot, include any effects, such as the depiction of space. Those effects take place in the listener after the sounds have already been reproduced monophonically. Technically the aim is to reproduce two entirely monophonic signals as accurately as possible. The two channels should be kept completely separate from each other, all the way through the sound-system until they have been reproduced monophonically in space by the loudspeakers.

The fact that the two signals of stereo are cut in the same record groove and that most components have two channels that share the same power supply and therefore have some interaction is merely an economic compromise. If it were possible, each channel should be absolutely independent from beginning to end. But that is generally impossible, because, to perfectly synchronize the channels, the two signals have to be combined somewhere. Either the two channels are combined in the record groove or as parallel tracks on the same tape. On any systems that would be practical for the end-user, some interaction of the signals is unavoidable either in the signals on the record groove itself, in the needle as it traces the signals, in cross-talk between the channels of the recorder, or in the sound system.

The problem of designing a sound-system, including building a loudspeaker, is to arrive at the most accurate possible reproduction of each separate signal that is fed through it, whether that signal is a monophonic signal or one channel of a stereo recording. Even if two or more signals are ultimately to be combined to produce spatial effects, the only way to assure that each signal will be reproduced as accurately as possible is to reproduce each signal as separately as possible. As will be shown in Section V, the effects of combining two signals distract the listener from the more important qualities of sound. Therefore, all system development, testing, and evaluation of sound-quality should be done monophonically, even if stereo is desired. Especially with loudspeakers, any technical decisions of design, such as the size and shape of the speaker or how the drivers are mounted in the speaker, should be arrived at only with the need for accurate rendition of a single signal in mind. Practices such as mounting the drivers unsymmetrically in stereo-pairs within the cabinet have nothing to do with how accurately that speaker will reproduce sound and can, in fact, compromise the sound-quality if the preferred position of the drivers for stereo listening is not the ideal position for accurate reproduction of a single signal.

III. IT IS IMPOSSIBLE TO KNOW THE REAL SPATIAL RELATIONS OF A STEREO RECORDING

In order to know if a system's reproduction of spatial relationships is accurate, one would have to know if the reproduction matches those relationships exactly as they were at the microphones during the recording. Since heads are differently shaped and no one can be in exactly the same place as the microphones, the spatial effects of direction, depth, etc., will be different for each person in the room. Even the engineer, listening with speakers or earphones, who decides on the microphone placement and mixes the signals to his liking, is only deciding subjectively how he wants the impressions of space. The monitoring equipment has already changed the spatial relationships and made them different from the spatial relationships at the microphones. And those relationships in the monitoring booth will be different from every other listening room.

An attempt to achieve precise reproduction of the spatial dimensions of a sound event by means of stereo is therefore doomed from the beginning. All that can be achieved is a particular spatial effect that may be preferred by the particular listener but cannot lay claim to being a reproduction of the original. Therefore, the prevalent procedure of evaluating sound-quality on the basis of the reproduction of such spatial effects as “soundstage", "imaging", "dimensionality" (terms currently used in professional circles) or on the basis of impressions of height, width, or depth are futile, since it can never be known whether the reproduction matches the original. All that is possible is to prefer a certain sound-system's reproduction of spatial dimensions over that of another system, but it is not possible to know when the reproduction corresponds to the original, even if the listener had been in the room in which the sound originated.

The characteristics of sound are so bound up with the size, shape, dimensions and materials of the source that they can only be reproduced exactly by duplicating the entire original physical situation. That would mean the same musicians in the same hall (or an exact duplication of the hall), sitting in exactly the same positions, etc., which is an impossibility. Therefore, absolutely exact reproduction of the spatial characteristics of a sound by another sound medium is impossible. It certainly cannot be achieved by differently shaped objects of different reflectivity, i.e., speakers, in a differently-shaped space of a different reflectivity, i.e., the listening room. Thus, and definitively, any attempt at reconstruction of the spatial characteristics of a sound source can only be a flawed approximation, which the listener can never be sure is the way the original sounded.

Furthermore, in stereo reproduction, there is only one very small area, equidistant from the speakers, within which the volume of the two separate channels is balanced. The equalization, i.e., the loudness of the different frequencies (highs, lows, middle, etc.) in relation to each other, can also be different in various parts of a room. But a room's equalization can be compensated for during playback and is a variable that has to be adjusted anyway for differing volume levels in relation to an individual listener's hearing at the time of the playback.² The one perfect area for the listener relative to the two stereo speakers is a small area in the exact center between and in front of the speakers, which extends only a small distance front to back. In any other positions, not only are the stereo balances wrong, but part of the content is missing. Obviously, for larger numbers of people (theater productions, movies, etc.), mono sound reproduction is more accurate for the bulk of the audience; it is, in fact, the only non-flawed possibility of reproducing the entire musical content.

IV. THE POINT OF ALL MUSIC IS TO EXPRESS SOMETHING

The expressive content of sounds is contained in the dynamic variations of the sounds. In fact, it is the dynamic content of the sound. The presentation of the dynamic subtleties is, therefore, the most important problem of sound reproduction.³ Problems of instability in the sound, which can plague the stereo spatial effect relative to the listener's position in the room, do not occur in the dynamic content of the sounds, which remains the same throughout the room. No matter how the balance of frequencies or stereo imaging may be changed, the sounds retain their dynamic-expressive character relative to each other as they flow in time.

Until the advent of stereo, spatial relationships were unimportant, even undesirable in the bulk of the world's music. In most classical music, the introduction of directional effects in the sound-reproduction distracts the listener from the important factors that actually contain the musical experience. The most important aspects of sound, especially those of classical music, have nothing to do with spatial effects and can be reproduced satisfactorily in mono.

A stereo signal introduces extraneous "effects” that distract from the more important dynamic aspects of music. Except for the pickup cartridge, stereo effects have nothing to do with the quality of the system components. The reason is that, in the sonic arts, spatial relationships are a very insignificant component of sound and are particularly insignificant in music. In most classical music, they can be eliminated without at all degrading the quality of the artistic experience.

The reason spatial effects distract from the expressive qualities of music lies in the limitations of human consciousness. Most people can only concentrate on one thing at a time, which, in music, is usually the melodic line. Few can concentrate on two things at a time. Since our consciousnesses are too limited to be simultaneously aware of all the components of music, concentrating on spatial effects distracts from the important aspects of music.

To understand why the stereo-spatial aspects of music-reproduction have been accorded such predominance, to the point of obscuring the truly important aspects of music, one must know that the easiest-to-hear aspects of sound are the directions the sounds are coming from. The most difficult-to-hear aspects are the subtle expressive nuances.⁴ Many people cannot hear subtle expressive nuances. Few are oriented towards listening for those nuances and practically no one takes pains to be sure they are hearing them correctly. Furthermore, long-playing record-playing equipment has, without exception, not as yet been able to reproduce the finest nuances of records. The record-listening public has not, therefore, experienced nuances as fine as they can be. It is taken for granted that they are hearing the exactly the same nuances as in the original.

In controlled situations, our institute has found that, although they do experience something, many people are incapable of accurately hearing expressive nuances either live or reproduced. They experience either a coarser form of the actual emotion of the performances or a completely different emotion.⁵ Even those capable of hearing fine nuances cannot hear them the moment they sit down to listen, especially with recordings. It takes quite a while for most people to settle down enough physically to begin to register the subtleties of the music and to experience the emotional content. To understand why, one must realize that what is heard is not the sound vibrations coming from the sound source; what is heard is the vibrations of the hearer's own body when it is caused to vibrate by the sound-waves striking it. Therefore, any nuances finer than the vibrational state of the body itself are not heard. Essentially, unless the body is in a physical state that is as fine as the music being listened to, the music is filtered through, and degraded by, the coarseness in the way the body is vibrating. This point is crucial to understanding why spatial effects figure prominently in most people's considerations of sound reproduction. Besides being easy to hear, spatial effects do not demand a particularly great refinement of body. Being able to notice and make-out spatial dimensions and directional effects impresses listeners who are not hearing the full content of the music, and gives them the impression that they are getting something out of the recording, when they are actually missing the point of the music.

If, from the beginning of a listening session, one would carefully observe what aspects of the music one becomes progressively aware of, one will notice that, besides notes and words, the first things one is able to hear are the simple spatial relationships (right, left, center, etc.). The last thing one is able to hear is the expressive, i.e., the emotional, content. The notes and spatial relationships can be called the “informational" aspects of sound, while the expressive content can be called the “experiential” aspect.⁶ The point to be made is that, without the experiential aspects, there really is no music, and that a distortion or change in the expressive content of a recorded performance is tantamount to changing the words in a sentence so that they mean something totally different from what the writer expressed. In other words, a complete falsification. On the other hand, it makes no difference to the quality or intensity of the way one experiences the expressive content of the music if the so-called "sound stage" is changed to give one or another impression of height, depth, and width, nor does it matter if the orchestra seems to be spread out in front of the listener (unless the music was specifically written for stereo, or has some of the expressive content contained in the directions of the sounds. The Beatle's album Sergeant Pepper's Lonely Hearts Club Band has excellent examples of both).

The spreading out of the sound in space is totally unimportant to and contrary to the aims of most music written before stereo became popular. In their orchestration, composers took great pains to create particular sound colorings by blending together the sounds of different instruments. Halls were designed so the sounds would thoroughly blend together before reaching the listener. When a conductor has balanced his orchestra, there is no need for separation of the instruments by spreading them out in differing directions in order to hear the different voices; whatever is supposed to be heard can be differentiated even from so far away that all the sounds of the orchestra essentially come from the same direction. Similarly, if a recording of such a well-balanced performance is correctly equalized to match the original, the balance that the conductor has achieved can be heard in mono, without the supposed help of stereo “separation”. This is an important point for the music-loving public because it means that older recordings of such excellent performances can, to a great degree, be restored since it is mainly their imbalances in the frequencies that obscure their detail and not a lack of stereo effects.

One must assume that composers know what their music should sound like, but, originally, composers were singularly unimpressed by stereo. Virgil Thomson went so far as to call it a “technological pretext” giving the recording companies “another excuse for recording the standard works all over again" (A Virgil Thomson Reader, p. 144). Another composer has mentioned that stereo is an excuse to sell new, more expensive equipment. No composer whom I asked or with whom I listened to music was the slightest bit interested in the depiction of spatial effects.

V. CONSCIOUSNESS IS LIMITED

Few people can concentrate on more than one thing at a time; but music consists of many things happening all at once. In fact, music is the ultimate consciousness-expander because, if you are not a Mozart, there is almost always more to be aware of than is humanly possible. Even with a single melodic line there are both the notes and the expression to be conscious of. For all but a very few particularly "gifted” individuals, consciously registering the expressive content of music demands every bit of concentration, awareness, and poise that can be mustered, especially when the expression is as fine and delicate as it should be in most classical music. In the finest ear-training and conducting classes, which even included seasoned professionals, there are enormous differences in sensibility to nuance and expressive content. Particularly interesting is that neither the ability to recognize tones (perfect pitch) nor extraordinary memories that allowed students to write down, from memory, anything the teacher dictated, was of help in hearing the expressive content. For example, many conductors (and other musicians too) with amazing ears for recognizing notes and hearing mistakes were and are strikingly deficient in expressive interpretive qualities. In most of these cases, the orientation towards the informational (mental) aspects of music takes up all of their powers of concentration and keeps them from registering the nuances of expression. Therefore, the addition of artificial informational material, such as the spatial effects of stereo, will distract most people from the more important experiential aspects of music.

VI. THE BODY IS SENSITIVE TO DIRECTIONAL IMBALANCES IN SOUND PRESSURE LEVEL

While the depiction of directional effects has little effect on the experience of most fine music, monophonic reproduction with only one loudspeaker is not the solution. That is because the body is sensitive to unequal sound-pressure levels, i.e., whether or not the sounds around it are of equal strength (volume). The body itself, which is highly sensitive to physical imbalances, has to recreate the vibrations produced by the sound-source, and this happens most effectively when the whole body is equally subjected to those vibrations. Music coming predominantly from one side creates an uncomfortable feeling of imbalance that is especially disturbing and distracting when the body is in the requisite relaxed, sensitive state necessary to hear fine musical nuances. Our tests have shown that music from four equidistant speakers, arranged as in quadraphonic listening, is the best arrangement, whether they play mono, stereo, or quad. The sound-pressure level is then most evenly distributed around the body.

The body's sensitivity to the lateral balance of sound is one reason why stereo seems to many to be superior to mono with one speaker. In stereo, when the listener is located exactly between the two speakers that are balanced for volume level, the sounds at least come from both sides. But in this respect, mono is still preferable, because the sound from both sides has the same volume level, while it varies in stereo. Because the musical experience is predominately physical, mono with at least four speakers surrounding the listener is the most effective way to experience recorded music.

VII. EPILOGUE

Originally, stereo was thought to be the next necessary step in perfecting sound-reproduction. But it was not. Monophonic sound-reproduction was still gravely flawed when stereo was introduced. The first step should have been to perfect monophonic sound-reproduction. Some companies were well on their way towards doing so. The last monophonic Mercury recordings were very close. It remained mainly for playback techniques and equipment to be perfected in order to retrieve the information which was on the grooves.

The introduction of stereo halted progress by introducing a whole new set of problems, namely the preservation and reproduction of two signals simultaneously. The state of the technique at that time was not able to combine two signals and still preserve the quality already achieved in monophonic recordings, especially not in phonograph pick-up cartridges. Sound quality, particularly in the playback, deteriorated markedly.

It is an individual's prerogative to want sound-reproduction that includes some sort of depiction of the placement of sounds in space. But to call stereophony accurate sound reproduction is a falsification. Stereo is an extraneous effect added to sound, a special phenomenon similar to 3-D in photography and cinema: both stereo and 3-D are effects that may be interesting, even "kicky”, but they are only effects and have little to do with the way we really hear or see.

Since hearing the expressive content demands all of most listeners' concentration, the addition of other effects such as those of stereophony, keeps the listeners from experiencing the real content of the music if they pay attention to those effects. Such distracting sound-reproduction has been the rule for over three decades among laymen and even among professionals, most of whom use recordings to help study scores (with the prevalence of recorded sound in our society, those who do not outrightly use recordings for study, still cannot avoid listening to and being influenced by recordings). The legacy of stereophonic sound-reproduction is a loss of sensitivity to and awareness of delicate, fine interpretative nuance in music. A full understanding of this fact must be cause for considerable alarm, because music is the flagship of a society. It leads, serves as an example for, and sets the tone of every other civilized pursuit within that society. It is the best civilization has and must be preserved at the highest possible levels.

¹Even sounds consisting of many combined sound sources, as in recording techniques using many microphones are monophonic. With multi-miking, each microphone documents the complete monophonic event from that microphone's position. Each channel of the stereo-signal plays a monophonic signal consisting of the combined signals from its microphones. But the use of many microphones is not even stereo. It is a whole new technique that has nothing to do with either the natural spatial relationships of the original sounds or the principles of stereo.

²See our papers on sound equalization.

³For most recordings, even digital, it is necessary to compress the overall dynamic range of the performance. That should be done section by section, i.e., the louder sections should all be reduced the same amount and the softer sections all raised in volume by the same amount. In that way, the dynamic subtleties within each section of the music will be preserved. Automatic dynamic range expanders are not desireable because they will expand and compress the dynamic range of everything, even the dynamics within a single melody, thus changing the whole expressive content of the performance.

⁴Various papers of The Anstendig Institute deal with the problems of hearing fine nuances, particularly those due to the fact that the body must be vibrating as finely as the nuances or they will be changed and degraded by the vibrations of the body itself.

⁵The author's insights into the hearing of expressive nuance comes from many years in the ear-training classes of some of the finest music schools and long testing with volunteers at The Anstendig Institute.

⁶This is explained in other papers of The Anstendig Institute, particularly “Hearing: The Informational and the Experiential”.

Papers on related subjects are available free of charge on request.

The Anstendig Institute is a non-profit, tax-exempt, research institute that was founded to investigate the vibrational influences in our lives and to pursue research in the fields of sight and sound; to provide material designed to help the public become aware of and understand vibrational influences; to instruct the public in how to improve the quality of those influences in their lives; and to provide the research and explanations that are necessary for an understanding of how we see and hear.