Using PCM samples as input for DFT

I am writing an application which will compute the DFT (using a FFT algorithm) of a sound signal. The inputs I have for the FFT algorithm are PCM samples - namely, I have a large list of 16-bit unsigned integers.

I am aware of the fact that I will need to compute the DFT of several segments of the sound signal independently using a window function, and I have already written working code which decodes an input sound file to raw PCM samples.

My question is about the definition of the DFT given on Wikipedia:

The DFT is supposed to perform an invertible, linear transformation on the inputs x(0), x(1), ..., x(N-1), where each x(n) is a complex number. However, I do not understand how to take my decoded sample integers, and turn them into complex numbers suitable for the algorithm.

I have seen certain examples online where each sample is divided to obtain a floating-point value in the range [0, 1), and then the imaginary part is set to 0.

Is this scaling down to [0, 1) necessary? And is representing each sample as x + 0i where x is the sample value correct?

Solution

Yes, you can create complex numbers by adding an imaginary part of 0 to each real value. Try that, it will work. However, you just doubled the amount of data to process and you created a lot of redundancy. You can notice the redundancy in the output: The resulting coefficients for the positive frequencies and for the negative frequencies will be identical, except for a different sign of the imaginary part. So to improve efficiency and reduce redundancy one typically uses a different transformation to turn N real values into N/2 complex values, and as a result you get (roughly) N/2 frequencies. I won't go into details here, but a nice implementation of both the complex FFT and the transformation for real input can be found here: http://sourceforge.net/projects/kissfft/

About your final question: No. You don't need to scale your input. The DFT is a linear transformation, so scaled input simply results in identically scaled output.

EDIT: BTW, are you sure it's a complex DFT what you want? For real data, in particular for PCM data, you should consider the Cosine Transform instead, which maps directly from real input data to real output.