What Component of Audio Music Visualizers Use

If you were to take, for example, Trap Nation's music visualizer (example) and looked at just the "intensity" they use to control the undulation of the circle, how is that calculated. I tried writing my own version and I simply was using the waveform with a little bit of smoothing and filtering, but I still got very large amounts of added noise in my data (noise which you can't hear at all). For example, I made a little sample audio file by just using Garage Band and the clarinet instrument. I noticed that my visualizer (as well Audacity) would show the volume increasing and decreasing at a rapid rate even when nothing was playing. I am getting my terms mixed up? Do these types of music visualizers simply use the waveform with more filtering, or do they use some completely different measure.

This is an example of what I mean. This is the test file I created and the volume over the past ~0.8 seconds and continuing for the next ~0.2 seconds decreased like a decaying sin wave.

Solution

Forms of Fourier transforms (DFT/FFT/STFT/DCT) are commonly used software transformations of audio for visualization, but so are others, such as wavelet decompositions, cepstrums, MFCCs, autocorrelations, & etc.