I've read lots of posts here as well as chatted on music school with other musicians regarding audio latency. Some people seem to be obsessed with getting ever lower latency figures, and I wonder, can we REALLY tell the difference between 1.5ms and 8ms latency? Could anyone come up with some reputable source that states what's the minimum latency humans can notice?
All I've read and experienced myself says that beyond 10ms it's noticeable, but what about below 10ms? Some people say that they totally notice it, others don't...I wonder how much is hype and how much real. This is important because higher latencies mean stressing more the system and probably running less instruments/effects, so why bother going lower if we don't really notice?
I post some findings on the theme below (latency in audio and as a general issue in interactive systems).
--------------------------------------------------------------------------------
http://www.whirlwindusa.com/wwlatart.html
--------------------------------------------------------------------------------Now, give this a try. Plug a microphone into a standard analog sound system and speak while standing 5 feet from the loudspeaker. You're now experiencing about 5 ms of latency. Step back another 5 feet, and you're experiencing about 10 ms of latency. Move the microphone a foot away from your mouth, and it adds another 1 ms.
Delays within an ensemble of musicians can be, and often are, relatively long. Think of a symphony where performers are located across a 40-foot stage. The conductor waves a baton to keep time. The percussion section might be 30 feet (and 30 ms) away, while the second violins are 5 feet (and 5 ms) away.
Does the conductor hear all of the notes attacking at different times? The harps might be 40 feet (and 40 ms) away from the timpani. Do they think they sound out of time with each other? How do the musicians stay in synch with each other?
Actually, research sponsored by the National Science Foundation, through the Stanford University Department of Music, has shown that performers in an ensemble have no problem synchronizing with each other while experiencing latencies as high as 40 ms and even greater. In fact, latencies in the 10 ms to 20 ms range actually have a stabilizing affect on tempo and are thought to be preferred over zero latency. 2
...
...
...
The Haas Effect (or Precedence Effect) is a principle first set forth in 1949 by Helmut Haas, which established that we humans localize sounds by identifying the difference in arrival time between our two ears. The same sound arriving within 25 ms to 35 ms of itself will be suppressed and not be heard as an echo.
Only if sounds are more than about 35 ms apart will the brain recognize them as separate sounds or echoes. We tested this, connecting a microphone through a 20 ms delay and monitored with Shure E3 ear buds.
The principle here: when you monitor your own voice, the delayed sound mixes in the ear with non-delayed sound that is conducted through your head via bones, cartilage and Eustachian tubes. This effect should be exaggerated when using headphones or in-ear monitors, because the listener is not hearing all of the other room reflections with various delays and volumes.
Several people were asked to read aloud from a magazine. The subjects included experienced sound professionals and musicians, amateur players and non-technical people with no monitoring experience. Everyone heard the initial 20 ms delay as a very short echo or "doubled" sound.
Then, the delay was gradually reduced from 20 ms. The subjects were told to stop us when the delay seemed to disappear. Then this was repeated while the person spoke short, sharp syllables like "check, check!"
Every person tested seemed to think the echo disappeared somewhere between 10 ms to 15 ms. I personally found it to be a rather dramatic change too - as if someone had suddenly bypassed the delay unit.
...
...
...
Next, we wanted to evaluate the situation where a guitarist is playing direct to the PA, or a drummer is using electronic drums. I played guitar into the delay and monitored through headphones while only listening to the delay.
...
...
...
On the other hand, a delay of a few milliseconds was imperceptible. In my guitar experiment, it seems that the delay isn't noticeable at all up to about 10 ms (again). It becomes slightly noticeable between 10 ms to 15 ms almost like it's not really an echo - just "something's there," but I could still play in time. The delay started to get difficult to contend with somewhere around 15 ms to 20 ms, and above 20 ms I really struggled with timing.
Now, this is an admittedly small sample. However, after these tests, it appears that even with the subjects being told that it's there, they couldn't detect latency as echoes with less than about 10 ms to 15 ms of latency. 3
...
...
...
Any time a sound arrives at different times, the sound interacts with itself to affect the overall frequency response. This also affects response when a single sound reaches two microphones at slightly different locations at about the same level
...
...
...
I'd venture that without this evidence, the general consensus would have been that less latency would always produce less comb effect and always be preferable. But a comparison of the waveforms from this section shows that the peaks are fewer and wider at 1 ms, and occurs at frequencies that would affect the vocal range just as the latency at 5 ms or even 10 ms does.
In fact, if the comb filtering at 1 ms were producing unacceptable tone quality, a possible solution might be to actually add a few milliseconds of latency to shift the affected frequencies and move the peaks closer together.
...
...
...
When latency was 10ms - 15ms, although some people detected the presence of "something", they felt they could probably live with it. Others made faces. When the latency was 15ms - 20ms, more people heard an effect and felt that it was becoming distracting. The tolerance will vary from performer to performer and with the material performed as to when they will be unable to deal with it.
From adobe: http://www.adobe.com/support/techdocs/331631.html
--------------------------------------------------------------------------------General guidelines that apply to latency times.
Less than 10ms - allows real-time monitoring of incoming tracks including effects.
At 10ms - latency can be detected but can still sound natural and is usable for monitoring.
11-20ms - monitoring starts to become unusable, smearing of the actual sound source and the monitored output is apparent.
20-30ms - delayed sound starts to sound like an actual delay rather than a component of the original signal.
Note: The human ear is accustomed to latency because it occurs naturally in the world around us. The frequency of a sound, distance from the sound's source, and the physical properties of the human ear all play a part in when we hear a sound. However, the amount of latency introduced in the recording and monitoring process is due to the physical properties and limitations of the sound card, device drivers and processing power of the computer CPU.
Sound on Sound: http://www.soundonsound.com/sos/apr99/a ... etency.htm
--------------------------------------------------------------------------------Musicians will be most comfortable with a figure of 10mS or less - the equivalent latency value you find on MIDI gear between pressing a key on the keyboard and hearing the sound.
General talk about interactivity and latency.
http://www.stuartcheshire.org/papers/LatencyQuest.html
The previous article is from 1996, but the idea remains the same.Interactivity
Given that other human perception parameters are known so accurately, you might think that the threshold of interactive response would be another well-known, widely studied parameter. Unfortunately it is not. For a number of years I've been interested in this question and I've asked a number of experts in the field of computer graphics and virtual reality, but so far none has been able to give me an answer. There seems to be a widespread belief that whatever the threshold of interactive response is, it is definitely not less than 15ms. I fear this number is motivated by wishful thinking rather than by any scientific evidence. 15ms is the time it takes to draw a single frame on a 66Hz monitor. Should it turn out that the threshold of interactive response is less than 15ms, it would be dire news for the entire virtual reality community. That would mean that there would be no way to build a convincing immersive virtual reality system using today's monitor technology, because today's monitor technology simply cannot display an image in under 15ms. If we find that convincing immersive virtual reality requires custom monitor hardware that runs at say, 200 frames per second, then it is going to make effective virtual reality systems prohibitively expensive.
...
...
...
100ms
As another data point, the rule of thumb used by the telephone industry is that the round-trip delay over a telephone call should be less than 100ms. This is because if the delay is much more than 100ms, the normal unconscious etiquette of human conversation breaks down. Participants think they hear a slight pause in the conversation and take that as their cue to begin to speak, but by the time their words arrive at the other end the other speaker has already begun the next sentence and feels that he or she is being rudely interrupted. When telephone calls go over satellite links the round-trip delay is typically about 250ms, and conversations become very stilted, full of awkward silences and accidental interruptions.
For these reasons I'm going to suggest 100ms as a target round-trip time for general network interaction. Some interactions may require a faster response time than this, but these interactions should hopefully be ones that can be computed locally without resorting to network communication. Some styles of interaction, like playing a game of chess, may work effectively with delays longer than 100ms, but we don't want to limit ourselves to only being able to support these limited kinds of interaction.
--------------------------------------------------------------------------------
Let's discuss!!!