Assignment 3: Spectral Processing

What we’re trying to do

What to turn in


The French mathematician, Jean Baptiste Fourier (a contemporary of Beethoven), discovered that any periodic waveform can be represented as the sum of one or more harmonically related sine waves, each with a fixed frequency, phase, and amplitude. He worked out the mathematics, later called the Fourier Transform, for performing this decomposition — as well as its inverse, for reconstructing a time-domain waveform from the analysis data. Fourier’s discovery led in the twentieth century to sophisticated methods of processing sound, using an extension of the Fourier Transform to cover digital, rather than analog, systems. The DFT (Discrete Fourier Transform) works with the discrete time intervals used in digital sampling systems. The FFT (Fast Fourier Transform) is a computationally efficient version of the DFT.

The FFT analyzes a time-domain waveform into a large number of frequency bands. The bands are equally spread across the frequency spectrum, from 0 Hz to the Nyquist frequency. The number of bands is typically 512, 1024, or 2048. (The FFT algorithm requires that the number be a power of two.) Each band represents a sine wave of a certain frequency, and the amplitudes of these frequency components constitute the most important part of the analysis. The FFT assumes that the sound you analyze is constant, not changing. But that’s not the kind of sound we’re most interested in, so FFT-based audio tools take many “snapshots” of the sound at equal time intervals. Each snapshot is an FFT analysis at one point in time. The results of the FFT analysis can be displayed in real time, giving you a graphical view of the timbral evolution of a sound, snapshot by snapshot — rather like a flip-book animation. FFT-based processing tools manipulate the FFT analysis data for each snapshot before resynthesizing a modified complex waveform. This resynthesis is the fun part.


SoundHack is a program that lets you perform various sound processing techniques that might not be available (at least with the same degree of flexibility) in commercial programs like Digital Performer and Pro Tools. SoundHack also lets you convert from one sound file format to another.

Download SoundHack for your own Mac here. (Sorry, no Windows version.)

A phase vocoder starts by taking a series of windowed FFT analysis frames (the snapshots mentioned above). Recall that this divides the spectrum into equal-sized frequency bins and shows the time-varying amplitude of these bins. (A spectrogram is a common visualization of this process.) The phase vocoder then processes the analysis data to determine the extent to which the actual frequency falling within a bin deviates from the bin’s fixed frequency. This time-varying frequency deviation lets the phase vocoder handle frequency more precisely than a simple FFT. Following manipulation of the phase vocoder data (the amplitude and frequency deviation for each bin), an inverse process resynthesizes a time-domain audio signal. The phase vocoder is typically used to perform pitch-shifting without affecting duration, or time-scaling without affecting pitch.

  1. Open a sound file in SoundHack. Choose the Hack > Phase Vocoder menu command.

  2. Set Bands to the number of FFT bands you want to use. A large number of bands yields better frequency resolution, while a small number of bands yields better time resolution.
  3. The Window popup menu allows you to choose different FFT window envelopes for different filtering characteristics. Stick with Hamming, von Hann, and Kaiser window types.
  4. The Overlap popup menu sets the amount of overlap between successive FFT frames. 1x means the frames are contiguous; 2x means one frame starts in the middle of the previous frame. The less overlap, the more fluttery the output will sound.
  5. Click the Time Scale button for time scaling, or the Pitch Scale button for pitch scaling. Enter the scale factor next to the Scaling popup menu (above the Pitch Scale button). This popup menu lets you enter desired length (for time scaling) or semitone transposition (for pitch scaling).

    If you want the time or pitch scaling factor to change during processing, click the Scaling Function check box, and then Edit Function. This brings up the Function Window editor, in which you can draw a time-varying scaling function.

Convolution is a process that multiplies the spectra of two sound files. When this works well, you can achieve a kind of blend between the two sounds that is very different from simply mixing them together. For example, you might be able to impose whispering onto the sound of the seashore. For this to work, at least one of the sounds must have fairly broad-band energy (energy across the frequency spectrum). If one sound has lots of low-frequency energy, and the other has lots of high-frequency energy, then multiplying their spectra will give you basically nothing. (Multiplying anything by zeros gives you zero.) Convolution is typically implemented by taking FFTs of the two sounds, multiplying the amplitudes of corresponding bins, and then taking the inverse FFT to return to a time-domain waveform.

Another use of convolution is to apply a spatial characteristic to a sound. The impulse response of a space, such as a concert hall, is the sound made by making a very brief broad-band sound in that space, which then bounces off the walls and other surfaces, creating echoes and reverberation. Convolution lets you impose this ambience onto another sound, making it seem as if the sound is heard in that space. (This is what the DP ProVerb plug-in does.) For more about this use of convolution, see this article.

  1. Open a sound file in SoundHack. Choose Hack > Convolution. Click the Pick Impulse to select a different impulse sound file.

    If you’re trying to impose reverb, try using the Voxengo impulse response files (IU network or VPN only).

    Or check out this extensive list of impulse response files.

  2. Click Process to write the convolution result to a file. Remember that the success of the output sound depends on the spectral characteristics of the source and impulse sound files. Trial and error is an essential part of working with convolution.

For more help with SoundHack, see the online manual (IU network or VPN only).


SPEAR (Sinusoidal Partial Editing Analysis and Resynthesis) is an excellent graphical implementation of the McAulay-Quatieri (MQ) sound analysis algorithm, which decomposes a sound into many sinusoidal partials. You edit these partials using a variety of tools (time scaling, frequency or pitch shifting, amplitude scaling, or simply drawing new partials) and then resynthesize them into a new sound file.

Download SPEAR for your own computer here.

  1. Open a sound file in SPEAR, using its File > Open. (Just press the Analyze button in the window that appears.) You can play and stop the sound by toggling the space bar.
  2. Experiment with the Controls window to change pitch, speed, etc.
  3. Use the Lasso Selection tool to select part of the spectrum. You can play only this part by holding down the shift key while pressing the space bar.
  4. Then use the tools in the palette, or the commands in the Transform menu to manipulate the selected part of the spectrum.

  5. Save the results to a new sound file. No, you don’t do this by using the File > Save command. That saves a spectral analysis file in the SDIF format. Instead, use Sound > Synthesize to File.

©2010-2017, John Gibson, Christopher Cook