Making music visuals — Part 3

Building an audio analyzer

Transcript

For the video mixer, we used videos to put motion into the visuals — remember, motion is very important for music visuals. Now we’re going to get motion in a different way, from the music audio. We’ll feed the same audio signal that goes to the speakers into Vuo and use properties of the audio to control the visuals.

I like this approach because it directly ties the visuals to the music. When you show a visualization of the audio, it matches what your ears are hearing (like now) and, I think, helps you notice things in the music that you might not have otherwise.

A little physics lesson here — by the way, I’m not a physicist. Sound travels through the air in the form of particles bumping up against each other, and these form a series of tiny high- and low-pressure zones that are are called sound waves. When the sound waves wash up against your eardrum, they make it vibrate like a drumhead. That’s fast; that’s slow motion. If you were to graph the displacement of the eardrum over time, it would look something like this.

An audio wave graph makes an excellent starting point for a music visual, so there’s a built-in node for it in Vuo. We just have to pull in the audio, send it through the Make Waveform Image node, and render it onto a window. Since my computer’s not hooked up to any audio inputs, I’m just pulling it in through the built-in mic. That’s just rendering the waveform.

Each image output by the Make Waveform Image node is a graph representing the displacement of your eardrum over a very short amount of time, about a hundredth of a second. That’s sending a stream of images like that.

We could build up some music visuals based on the audio waveform. Since we’re projecting on a dome, we need to not only add the Make Waveform Image node, but we need to resize the image to make it a square and then we need to vignette it to make it a circle. We could also add some effects, just like we did for the video mixer. I won’t take you through it step-by-step here because I want to have time to show more visualizations, but here’s a composition that has an effect, and here’s how it would look on the dome. It’s working! So that’s a waveform visualization.

The next visualization involves separating audio into its low, mid, and high frequencies, roughly corresponding to the low-, mid-, and high-pitched sounds. There is a mathematical way to do this, and it’s based on another physical property of sound.

When you whistle, it makes a sound wave that’s close to the very simplest sound wave, what in math is called a sine wave. A pure sine wave sounds like this. Most musical instruments make more complex sound waves. The different shapes of the sound waves are what give each instrument its unique character.

It turns out (as we say in math) that the more complex sound waves that we hear are actually a combination of simpler sound waves generated by the instrument. For example, if you were to play the A above middle C on a guitar, the entire length of the string, anchored at the two ends, would vibrate at a frequency of about 440 times per second (this is not 400 times per second by the way). Now, if that’s all that happened, then a guitar would sound like a sine wave. But the string’s motion is more complex than that. Smaller waves also form, subdividing the string into halves, thirds, fourths, and so on. Each of these waves adds its own higher frequency, called a harmonic. Add those waves all together, plus the vibration of the body of the guitar, and you get a complex wave, which has the unique timbre of a guitar.

There’s a mathematical technique for taking a complex sound wave and separating it out into its simple sine waves, called a Fourier transform. You end up with a graph where the horizontal axis represents the frequency and the vertical axis represents the amount of the complex sound wave that came from a simple wave at that frequency. With the guitar example, most of the complex wave came from the A note at a frequency of 440 Hz, and lesser parts came from vibrations at higher frequencies.

Like the waveform graph I showed earlier, the Fourier transform graph is another good starting point for a music visualization. In Vuo, there’s a node that calculates the Fourier transform called Calculate Amplitude for Frequencies, and it outputs the list of heights from that Fourier transform graph. You can take these numbers and plot them on the screen for a spectrum analyzer effect. (This is a movie; this is not live sound. I was just making some noise by rubbing my fingers across the microphone.)

As with previous compositions, we should vignetted the image to restrict it to the dome, and we can add various effects to make it more interesting. Steve’s showing an example of how that could look. We need some music; who wants to sing? (Somebody plays a harmonica.)

You could riff on the audio waveform or Fourier transform in a lot of different ways to make audio-reactive visuals. These audio graphs, along with the video mixer we demoed earlier, are setups that provide some underlying motion to build on for making music visuals.