Electronic Studio Resources II: Assignment 3 — Spectral Processing
The French mathematician, Jean Baptiste Fourier (a contemporary of Beethoven), discovered that any periodic waveform can be represented as the sum of one or more harmonically related sine waves, each with a fixed frequency, phase, and amplitude. He worked out the mathematics, later called the Fourier Transform, for performing this decomposition — as well as its inverse, for reconstructing a time-domain waveform from the analysis data. Fourier’s discovery led in the twentieth century to sophisticated methods of processing sound, using an extension of the Fourier Transform to cover digital, rather than analog, systems. The DFT (Discrete Fourier Transform) works with the discrete time intervals used in digital sampling systems. The FFT (Fast Fourier Transform) is a computationally efficient version of the DFT.
The FFT analyzes a time-domain waveform into a large number of frequency bands. The bands are equally spread across the frequency spectrum, from 0 Hz to the Nyquist frequency. The number of bands is typically 512, 1024, or 2048. (The FFT algorithm requires that the number be a power of two.) Each band represents a sine wave of a certain frequency, and the amplitudes of these frequency components constitute the most important part of the analysis. The FFT assumes that the sound you analyze is constant, not changing. But that’s not the kind of sound we’re most interested in, so FFT-based audio tools take many “snapshots” of the sound at equal time intervals. Each snapshot is an FFT analysis at one point in time. The results of the FFT analysis can be displayed in real time, giving you a graphical view of the timbral evolution of a sound, snapshot by snapshot — rather like a flip-book animation. FFT-based processing tools manipulate the FFT analysis data for each snapshot before resynthesizing a modified complex waveform. This resynthesis is the fun part.
SoundHack is a program that lets you perform various sound processing techniques that might not be available (at least with the same degree of flexibility) in commercial programs like Digital Performer and Pro Tools. SoundHack also lets you convert from one sound file format to another.
Download SoundHack for your own Mac here. (Sorry, no Windows version.)
A phase vocoder starts by taking a series of windowed FFT analysis frames (the snapshots mentioned above). Recall that this divides the spectrum into equal-sized frequency bins and shows the time-varying amplitude of these bins. (A spectrogram is a common visualization of this process.) The phase vocoder then processes the analysis data to determine the extent to which the actual frequency falling within a bin deviates from the bin’s fixed frequency. This time-varying frequency deviation lets the phase vocoder handle frequency more precisely than a simple FFT. Following manipulation of the phase vocoder data (the amplitude and frequency deviation for each bin), an inverse process resynthesizes a time-domain audio signal. The phase vocoder is typically used to perform pitch-shifting without affecting duration, or time-scaling without affecting pitch.
If you want the time or pitch scaling factor to change during processing, click the Scaling Function check box, and then Edit Function. This brings up the Function Window editor, in which you can draw a time-varying scaling function.
Convolution is a process that multiplies the spectra of two sound files. When this works well, you can achieve a kind of blend between the two sounds that is very different from simply mixing them together. For example, you might be able to impose whispering onto the sound of the seashore. For this to work, at least one of the sounds must have fairly broad-band energy (energy across the frequency spectrum). If one sound has lots of low-frequency energy, and the other has lots of high-frequency energy, then multiplying their spectra will give you basically nothing. (Multiplying anything by zeros gives you zero.) Convolution is typically implemented by taking FFTs of the two sounds, multiplying the amplitudes of corresponding bins, and then taking the inverse FFT to return to a time-domain waveform.
Another use of convolution is to apply a spatial characteristic to a sound. The impulse response of a space, such as a concert hall, is the sound made by making a very brief broad-band sound in that space, which then bounces off the walls and other surfaces, creating echoes and reverberation. Convolution lets you impose this ambience onto another sound, making it seem as if the sound is heard in that space. (This is what the DP ProVerb plug-in does.) For more about this use of convolution, see this article.
If you’re trying to impose reverb, try using the Voxengo impulse response files (IU network or VPN only).
Or check out this extensive list of impulse response files.
For more help with SoundHack, see the online manual (IU network or VPN only).
SPEAR (Sinusoidal Partial Editing Analysis and Resynthesis) is an excellent graphical implementation of the McAulay-Quatieri (MQ) sound analysis algorithm, which decomposes a sound into many sinusoidal partials. You edit these partials using a variety of tools (time scaling, frequency or pitch shifting, amplitude scaling, or simply drawing new partials) and then resynthesize them into a new sound file.
Download SPEAR for your own computer here.