AB TESTING, A MISAPPLICATION

AB TESTING, A MISAPPLICATION OF VISUAL CRITERIA IN AUDIO

(AB testing is a form of testing designed to compare different qualities of sound. In audio, it is used to compare and evaluate the differences in sound between components. The most prevalent form is to switch back and forth between components while a single sound source, usually a recording of music, is playing. Similar methods are used in aural research.)

For decades, controversy has raged in the audio world over the validity of AB testing. While the controversy primarily concerns the AB comparison of audio components, AB testing is also used extensively in scientific research into human hearing and in the evaluation of recorded sound quality. In fact, scientific investigation of human hearing in more than a rudimentary fashion and the investigation of complex sounds, containing both timbral distinction and details of nuance as the sounds flow in time, first became possible with the advent of recorded sound. Before sound recordings, it was impossible to repeat any such sounds exactly the same way, especially humanly produced sounds such as the expressive nuances of music.

The senses differ markedly in their characteristics. In order to investigate any of the senses, it is necessary to differentiate the characteristics of each of the senses, recognize how each works, and apply only procedures pertinent to the sense being investigated. Above all, while testing one of the senses, it is important to not misapply procedures that only apply to another of the senses. Unfortunately, most methods of testing hearing do just that by trying to duplicate the visual procedure of direct comparison. Direct visual comparison has been accepted for centuries as scientifically accurate. But direct comparison is possible only with sight and impossible with all the other senses. That fact is probably the most pertinent scientifically established fact about all sensory perception.

It has long been known that the only capacity of any of the five senses that meets scientific standards for accuracy and dependability belongs to sight. That capacity, known in the optical world as direct comparison, is the visual comparison of objects lying directly next to each other (not an inch or a centimeter apart, but absolutely next to each other. In color testing, the one color is laid directly on top of the other color). This direct comparison of immediately adjacent visual images reaches its highest level of precision in the comparison of shades of color and gray tone (scientifically accurate color charts, determined by direct visual comparison, have existed for centuries).

That direct visual comparison is the only scientifically accurate capacity of any of the senses was well known in the first half of our century, when it was acknowledged that there was a need to devise a focal-point-exact method of focusing optical lenses in cameras. (The original method of focusing in cameras, the ground glass, is highly inaccurate, mainly because it does not utilize the highly accurate sensory capacity of direct comparison.) But, in the second half of the century, mention of direct comparison has been pointedly avoided in the optical-photographic fields, because the only device that succeeded in utilizing direct comparison and, thereby, achieving absolutely exact (focal-point-exact) focusing, the Messraster, was not owned by the leaders of the industry. From 1939, when that patent was first introduced, until the inventor's death, the large firms that controlled the German optical industry fought to keep that patent off the market. All of the manual focusing devices that have been available to the public fail to achieve focusing accuracy because they do not utilize the only accurate capability of sight, direct comparison.

In the experience of the author, who was born in 1936 and reached maturity in the second half of this century, the ascendancy of direct visual comparison over all other sensory comparison has been mentioned only once. That was in 1960, in West Berlin, by Joseph Dahl, the inventor of the Messraster focusing device. Mr. Dahl, from whom the author bought a number of Messrasters in the early 1960's, took great pains over quite a number of months, to explain and demonstrate the problems of sensory comparison to the author in order to show why the Messraster, by using direct comparison, is the only device in photography that can focus accurately.

Only the visual comparison of unmoving objects directly next to each other, with no space between them, can claim scientific accuracy. All other forms of comparison using any of the senses, including other forms of visual comparison, do not begin to meet scientific standards of accuracy and dependability. A clear understanding of why that is so, as well as an understanding of why direct comparison is impossible with the other senses, is essential to valid comparison-testing in audio and, in fact, all comparison of sensory impressions.

Why is direct visual comparison the only accurate form of comparison in all five senses? Because it is the only form of sensory comparison that places no demands upon our memory. All other forms of comparison, including visual comparison of objects not directly adjacent to each other, depend on our memory for sensory impressions. And our immediate memory for sensory impressions is notoriously undependable. Place minutely varying shades of color next to each other and we have absolutely no difficulty telling which ones are lighter, shinier, warmer-toned, cooler-toned, etc. But showing them to us one after the other introduces a profound degree of uncertainty and doubt and we will often guess wrong as to their differences. (Mr. Dahl demonstrated this by showing me two pieces of paper, one after the other, and then asking me which was the lighter in tone. I remembered wrong.) That uncertainty can only be definitively resolved by again placing them next to each other, i.e., by direct comparison. This truth is the reason that the first, most basic through-the-lens focusing device, the ground glass, proved inaccurate.

Understanding why the basic ground glass fails to achieve accuracy is fundamental to defining and understanding the problems of comparison, not only in sight, but in all sensory perception. With a ground glass, it is necessary to focus back and forth over the apparently sharpest setting, remembering how far one can go in each direction before the image becomes obscured. Not only is our memory for the images at the various stages of focus undependable, but the eye quickly loses acuity and begins to see longer stretches as sharp the longer one tries to focus. Experienced photographers using a ground glass know that focusing should be done quickly, going back and forth over the point of apparent focus as few times as possible. Otherwise, whatever little bit of accuracy the ground glass can deliver will suffer, as the memory vacillates more and more the longer the process continues. Absolute accuracy through direct comparison was achieved with a ground glass in the Messraster, which is simply a divided ground glass that eliminates the use of memory by allowing the viewer to compare directly the too far and too close settings, right next to each other. The main reason this little known, but very important, device achieves its accuracy is that it utilizes the only accurate sensory capacity, direct visual comparison, and eliminates the need to use the memory.

The other senses have the same problem: memory of sensory impressions is undependable and, with even slightly extended non-direct comparison, the characteristics of the different sensory stimuli blend into each other and the differences become blurred. With smell, the longer one compares different scents, without long waits in between, the more the difference blurs. And the longer one sniffs a scent, the less strongly one can smell it, to the point that one eventually stops smelling it. With taste, flavors quickly weaken and our palate also quickly blends the flavors, losing its ability to differentiate them. For example, salt lovers know that the more salt they use the more they have to add, because, like being subjected to a particular smell for a long period of time, the palate quickly stops tasting the salt until more is added.

Evaluations of delicate differences in tea and coffee flavors, perfume scents, and other similar sensory products, have to be performed by highly sensitive, specially trained experts under specially controlled circumstances. Even the slightest distraction can ruin their work, because of the great demands these activities place on the memories even of those trained individuals who are intimately familiar with the various pitfalls of their work. And great demands are made on these people in regard to physical discipline, poise, personal delicacy and refinement in order to preserve their physical sensitivity.

The body of the listener is another variable to which differences perceived in AB testing can be attributed. The body changes throughout the day. Disciplined people are usually not as sensitive when they wake up as later in their progress through the day. Physically undisciplined people's bodies also vary throughout the day, though not necessarily in the sense of becoming more sensitive over the course of the day. All sensory perception is conveyed to us through our bodies. There are no abstract sensations. It is well-known that various states of tension and relaxation bring with them differing amounts of sensitivity. Disciplines like Yoga, Zen, etc. can heighten sensitivity through manipulation of the body.

The point that must be especially emphasized in regard to activities that demand great sensitivity is that, no matter how naturally gifted the person, a high degree of physical sensitivity is a cultivated thing that has to be purposely achieved and sustained. (Even Mozart, probably the most naturally gifted human being with regards to sensitivity, had to go through long training, had to be subjected to the finest examples of art in Europe, and had to mingle with the most refined, cultivated people of great artistic discrimination and personal discipline in order to develop that gift. Yet most people think they can simply sit down at any time, in any physical state, and discriminate between subtle differences in the nuances of musical performance and the small but often crucial differences in sound qualities between sound components.) The need to cultivate physical delicacy and discipline during activities involving sensory perception is well known in the fields of touch, taste, and smell. In those fields, not only are controlled circumstances considered necessary for all critical perception, but the people doing the perceiving are expected to preserve the physical delicacy and discipline necessary for such perception. On the days they work, tea tasters, perfume testers, and wine-tasters follow strict physical and dietary regimens designed to keep them in the most sensitive physical state. And their surroundings are carefully controlled to provide an ideally calm and non-distracting environment for extremely delicate perceptive work.

But similar conditions regarding the listener's physical discipline, refinement, and surroundings etc., are seldom, if ever, insisted upon in attempts at audio comparison, even though hearing is the most complex, most variable, most easily disturbed, least dependable and most difficult to monitor of all the senses. That is partly because hearing is also the most taken-for-granted of all the senses, and the least often tested.

There is also an enormous range of differences in hearing acuity. There are people barely able to hear a loud sound and those who hear that sound so loudly that it is almost painful. There are those who can concentrate on a sonic event intensely, for long periods of time, and those who cannot sustain their concentration for more than a second or two and allow any little thing to distract them. There are those who can keep their mind firmly on what is happening in the exact juncture of the present, and those who are either anticipating what is coming or, having missed some detail or lost their concentration, lose themselves in reviewing what they have heard, while the music or other sonic event continues, thus, in effect, missing everything. In truth, most normal people who have not had specific training, exhibit some form of these aberrations in their manner, i.e., habits, of hearing. In normal life, without utilizing complicated testing that is not completely dependable, it is extremely difficult, if not impossible, even to notice let alone differentiate differences in the way we hear. However, correct habits of listening can be trained, and with the help of basic yoga-type disciplines, both Eastern and Western, concentration, the ability to resist distractions, and the ability to keep the mind empty and concentrated solely on the (sonic) events of the moment (of the present) can be developed.

But the usual assumption in society is that people who do not need a hearing aid (i.e., do not have a medically proven hearing disability) all hear essentially alike. Because most of our hearing is used to receive dispassionate information, which is conveyed in the meaning of words and does not depend on the nuance of how it is conveyed, we do not think about all the different ways a sound can be produced or all the different ways we can hear it. Because most of us can make out the basic information in the sounds we hear, i.e., words and their meanings, and most daily communication is mainly to convey such information, we generally do not make demands of sensitivity, especially sensitivity to nuance, upon our hearing and we ignore the differences in hearing that our different physical states (moods) will produce. Yet AB testing deals mostly with the perception of differences in sound qualities and nuances, and not at all with the conveying of information.

It has been necessary to establish the role of physical refinement in sensory perception because it is an important factor in attempts at AB testing in sound, and, for that matter, in all comparison of sonic impressions. There is always that distinct possibility that any differences in the way sounds were heard could be just as much because the listener moved, became upset, tensed, or otherwise changed his/her physical state as because the sounds actually differed in the manner they were produced at the source.

There is also the distinct probability with AB testing when one performance or component is less delicate than another to which it is compared, that the listener will still be vibrating in the vibration of the more coarse example when the finer one is played. Since we actually hear the vibrating of our own bodies, the delicacies of the finer example will be filtered through, i.e., produced by, the listener's own more coarsely vibrating body, and, therefore, changed or not heard at all.

In fact, except for an extremely few people with the natural talent of the true orchestra conductor, who can hear with great acuity even when physically and mentally active, the only time people--any people--are actually able to hear and experience the nuances of finely-performed, high-quality music, is when they are absolutely calm, quiet, fully concentrated, and perfectly still. Without specific training, few people are able to place themselves in such a state at will and, therefore, have to wait for the moments when it happens by itself, i.e., when they just happen to gravitate into the right mood. Most of us have certain recordings that can make us cry, or uplift us, or cause such piercingly exquisite experiences that we feel like our heart has jumped into our throat, to utilize a particularly apt colloquial description. But we also know that we cannot just sit down and have those experiences happen at will. We have to wait until we are in the right "place" to be able to experience them.

Since fine music is seldom available at the same moment that most people are physically able to be receptive to it, few people ever hear and enjoy the felicities of fine musical performance, and those that do are not able to do so very often or for very long periods of time. Therefore, few people have even the slightest preparation for any kind of sonic comparisons. They lack the necessary acuity, awareness of the need for physical discipline, practiced concentration over long periods of time, etc. to be dependable subjects. In AB, or other relatively quick forms of comparison, there always has to remain the suspicion that differences in how the sound was heard were as much due to physical instability in the listener as to differences in the sound.

I have made the point that, because direct visual comparison is generally easy to perform and the most often utilized method of differentiation in our lives, we tend to take it for granted that we can accomplish the same thing with the other senses. I have also shown that direct comparison is simply not possible with the other senses because no sensory comparisons with sound, smell, touch, or taste can utilize direct, simultaneous comparison and must, therefore, use the memory (touch would seem to come closest to visual comparison because most things being touched do not change appreciably over the short periods of needed to attempt comparison, and we can simultaneously touch two different things with our two hands. But no two hands or fingers are exactly the same. Alternately touching two objects with the same body part again makes demands on our undependable memory). I have also made the point that, in sound, there must be even greater uncertainty than with other senses, because sound is the most fleeting of sensory stimuli. Sound cannot linger, as in taste or smell, and cannot remain still, as in touch and direct visual comparison. Only carefully engineered mechanical sounds can be absolutely steady and unwavering. All other sounds, even seemingly sustained ones, are constantly changing, i.e. fluctuating, in time.

Am I saying that sonic comparisons are impossible? Not at all. I am saying that they must be accomplished in a completely different, unrelated manner from visual comparisons and with even more care than the extraordinary care taken in serious comparisons of taste, touch, and smell.

But how? The answer lies in understanding that quick, immediate comparisons do not work. The way to make dependable comparisons is through great familiarity with the audio components, sounds, performances, etc. that are to be compared. They must be listened to enough times for the persons doing the comparing to be sure they have really heard and experienced all of the subtle content of the sounds. And once they are sure they have accurately heard the content of the sounds, they must become familiar with it. That usually means living with the sounds over an extended enough period of time to allow the listener to be fresh and attentive during listening periods. The whole process can take hours, days, or even weeks. With familiarity, memory becomes dependable, as long as proper precautions have been taken to maintain the same sound-quality, the same room conditions, a refined state of body, etc., during all listening.

Furthermore, to be truly accurate, all listening comparisons, including those in medical testing of hearing, should be made under circumstances in which the listeners feel completely comfortable, as they would in their own homes. Listening periods should not exceed the listener's comfortable span of attention, and the sonic programs to be compared must be repeated often enough for the listeners to be absolutely sure they are familiar with the programs. Above all, during these periods, there should be no interruptions or physical exertions on the part of the listener that might disturb his/her physical equilibrium, which means that the programs have to be turned on and off by someone other than the listener.

But these preconditions should not be misconstrued as possible means of better conducting AB comparisons. AB testing has absolutely no validity in audio comparisons. Far from being a means of bringing scientific accuracy to audio evaluations, as believed by many audio practitioners, AB testing is based on human capacities that are undependable and do not at all fulfill the requisites of scientific accuracy. There are no exceptions. But the invalidity of AB testing is particularly true when music is used for the comparison, especially when a comparator device is used to switch back and forth between audio components while the music is playing. For that process to be at all logical, the exact same portion of the music would have to be heard each time the switch is operated. But the repetition of exactly the same short sequence of music (or any other sonic program) would bring with it its own irritations that would disturb the listener and negate the test.

The Anstendig Institute strongly recommends that all people professionally involved in AB testing and other comparison of sensory impressions thoroughly study and understand, through first-hand experience and demonstration, the principles involved in the various available photographic focusing devices. It is important to an understanding of all sensory perception to know why these devices that use the human eye are inaccurate. It is also of the greatest importance to understand the truth about depth of field, in the photographic sense: that it really pertains to unsharpness, not sharpness; that depth of field does not exist in the sense of depth of sharpness, but is, rather, a description of the extent to which increasing unsharpness can be tolerated before it disturbs the viewer, a parameter that is entirely subjective and, therefore, undependable because it is determined by and changes with the sensitivity and mood of the individual viewer.

This understanding of photographic images and the effect of sharpness is so crucial because it is with visual comparisons that the human being usually begins conscious, purposely initiated sensory comparisons. With visual comparisons, we first and most dependably develop our sense of discrimination, i.e., our ability to differentiate and evaluate subtle differences in all things. But there are important shortcomings and misunderstandings in photographic imagery that carry over into all visual imagery and, unless can we are aware of them, ultimately affect our powers of discrimination. Along with sounds, photographic imagery is probably the most omnipresent element and influence in our modern life. It pervades everything we do, especially when we mistakenly attempt to utilize processes pertinent only to sight in our work with the other senses.2 The misuse of visual criteria in hearing would by itself be bad enough. But the fact that our understanding of visual images is based on wrong assumptions makes the use of visual criteria all the worse.

Unfortunately, a large part of the audio research that has already been published has utilized AB or similar testing that is simply a misapplication of visual criteria in the realm of sound. All of that research has, therefore, to be considered invalid. If any valid conclusions have been reached by these methods, their acceptance will have to wait until they can be confirmed by means that are scientifically accurate. It is difficult to comprehend the enormity of this situation. Whole edifices of scientific thought, methods, and practice have been built upon this scientifically invalid procedure. No matter how the procedure is refined (as in double-blind AB testing, using two or more blindfolded subjects and comparing components, etc., in such an order that the subjects could not guess their identity), there is no possibility of dependably recognizing subtle differences. In AB testing, any differences being recognized and compared have to be so large that they should be apparent to the same people in any kind of listening.

An argument has blazed for years between those in the audio community who swear they hear subtle differences between components they have lived with and those in the AB testing community who insist that AB testing has proved those people wrong -- that those people must be imagining the differences, because carefully controlled AB testing has shown that the differences do not exist. There are many pitfalls in any kind of listening. But we have seen that those who live with their components before evaluating them could very well be correct in their evaluations. At least they are using a valid procedure.

What is clear is that those using AB testing have not been using a valid procedure. Unless the misconceptions of sight and sound in the scientific world are quickly cleared up, when current or future generations finally realize the truth, they will have to throw out most previous research and, therefore, almost their whole fund of knowledge, because it will all have been based on invalid premises and carried out under invalid conditions.

¹See Messraster patents of 1939 and 1966, in USA and all Germanies. This patent, well known in the optical and photographic fields at the time, is the only focusing device that utilizes direct comparison as the actual focusing method and is the only patent that claims focal-point-exact focusing of lenses in cameras. The correctness of its assumptions was attested by the leading expert witnesses in the field of optics of the time, the optical institutes of the technical universities of Germany (The Anstendig Institute has copies of the affidavits from the Optical Institute of the Berlin Technical University, which, at the time, was the leading optical institute of the world). From the time it appeared, the Messraster was fought against by the industry and kept off the market. It still remains the only possibility of achieving dependable focal-point-exact focusing in all photography.

²See our paper, “The Misapplication Of Visual Criteria In Sound”.

The Anstendig Institute is a non-profit, tax-exempt, research institute that was founded to investigate stress-producing vibrational influences in our lives and to pursue research in the fields of sight and sound; to provide material designed to help the public become aware of and understand stressful vibrational influences; to instruct the public in how to improve the quality of those influences in their lives; and to provide the research and explanations that are necessary for an understanding of how we see and hear.