Even under ideal conditions it is difficult to comprehend how sound relates to the world we live in. For most artists, creating good music is challenging enough, let alone investigating the physical properties of audio and perception. Yet award winning sound artist David Hochgatterer has succeeded in doing just that. His recent installation, Time To X, uses a 96-channel speaker array to reveal the complex relationships between time, space, and sound. We asked David to lift the lid on the concepts and practicalities behind his work and explain what we can learn from understanding the building blocks of audio.
For those of us who are unaware of the physics involved, can you elaborate on the processes you used to turn audio time into audio space?
The idea behind Time To X is based on Einstein’s space-time model of a four dimensional "hyperspace". As it´s hard to imagine or visualize more than the three dimensions that we are used to living in, I thought I should transfer the "time dimension" into a geometrical expanse, the x-axis.
A famous example of this is a cinema film-strip where the frames line up one after each other, quantized to 24 stills per second. The single pictures on the filmstrip lose their time factor when viewed separately, so I tried to find a way of creating an equivalent with audio. As sound consists of an oscillation, it´s always based on a time factor, which makes it impossible to slow it down to a standstill.
So you applied the concept of stop-time motion used in films to the audio realm?
My solution was to quantize the sound file corresponding to movie frames. These frames, successively lined up, give time (in the form of sound) a physical expanse when played back simultaneously. With a lot of experimenting I found out that a quantisation of 20 "sound-frames" per second is enough to render speech and music in a clear and understandable way.
How did you assemble these split fragments of audio into an unbroken sequence of sound?
The next step was to redress that these frames are actually loops of 50 milliseconds and still have the attribute of being individual repetitions of short time periods. I wanted to have all the audio information of the single parts collectively present, so I needed to disperse it within the individual loops. This was accomplished by smearing the sound in the time domain.
First, I multiplied every sound 128 times and spread it within the loops to get a grainy yet uniform sound using the GRM Freeze delay plug-in. Then I recorded each of these files for about a minute and time-blurred the sound using the interpolation capabilities of iZotope RX3 Advanced. There was now all the audio information constantly present, but still there were very slow, minimal variations due to the processing of RX3. To avoid noticeable looping when playing back all 96 sounds continuously on the audio installation, I made every file a different length and used the capability of Ableton Live to play back loops asynchronously.
The physical dimension of the speaker array now corresponds to the time period that is mapped to it. All the acoustic information is permanently present and it is possible to physically move in time when walking along the array – at any speed, forward, backward and even to "stop in time".
What can we learn from the relationship between time and space as it relates to audio production and manipulation?
The most surprising discovery during my work progress on TIME TO X was how few "audio-frames" per second are required to reproduce speech and music in a reasonable way. With just 20 sounds per second it´s possible to play back a spoken sentence at normal speaking rate with every consonant and transient clearly understandable.
Another very fascinating thing for me was that consonants like "T, K, P" and transients in music technically can’t be reproduced when stretched to an infinite length. When preparing the audio files they become two different kinds of filtered noise. For instance, a "T" sounds a little like white noise and a "K" more like pink noise. The impressive fact is that it´s still possible to differ these consonants from an "F", "S" or "SCH" when walking along the installation’s speaker array at high speed.
The final conclusion of the whole experiment for me is that the human ear is able to recognize frequencies up to nearly 20kHz. Quantization of the acoustical variations of speech or music can be relatively slow, yet still we recognize most of the information in what we hear.