Taking fft / ifft of a stereo signal in numpy?

This question is related to both Apply FFT to a both channels of a stereo signal separately? and How to represent stereo audio data for FFT, but specifically for numpy's fft package.

How do I take the FFT of a (real-valued) FFT in numpy, and how to I get it back to the time domain?

Solution

If your stereo data is in two columns (i.e. left channel in column 0 and right channel in column 1), you can do it in a single operation - you only need to transpose the data first. To demonstrate:

Here are two channels of data, eight samples long. The left is a sine wave at f1 (it completes one cycle in the eight samples), the right is a sine wave at f2 (it completes two cycles):

s = array([[ 0.14285714,  0.14285714],
           [ 0.12870984,  0.08906997],
           [ 0.08906997, -0.0317887 ],
           [ 0.0317887 , -0.12870984],
           [-0.0317887 , -0.12870984],
           [-0.08906997, -0.0317887 ],
           [-0.12870984,  0.08906997],
           [-0.14285714,  0.14285714],
           [-0.12870984,  0.08906997],
           [-0.08906997, -0.0317887 ],
           [-0.0317887 , -0.12870984],
           [ 0.0317887 , -0.12870984],
           [ 0.08906997, -0.0317887 ],
           [ 0.12870984,  0.08906997]])

If you transpose it (so left channel is row 0 and right channel is row 1), you can then pass it directly to np.fft.rfft() for conversions:

>>> s_t = s.transpose()
>>> s_t
array([[ 0.14285714,  0.12870984,  0.08906997,  0.0317887 , -0.0317887 ,
        -0.08906997, -0.12870984, -0.14285714, -0.12870984, -0.08906997,
        -0.0317887 ,  0.0317887 ,  0.08906997,  0.12870984],
       [ 0.14285714,  0.08906997, -0.0317887 , -0.12870984, -0.12870984,
        -0.0317887 ,  0.08906997,  0.14285714,  0.08906997, -0.0317887 ,
        -0.12870984, -0.12870984, -0.0317887 ,  0.08906997]])
>>> f = np.fft.rfft(s_t)
>>> np.set_printoptions(suppress=True)  # make it easier to read
>>> f
array([[ 0.+0.j,  1.+0.j,  0.+0.j, -0.-0.j,  0.-0.j, -0.+0.j,  0.+0.j, 0.+0.j],
       [-0.+0.j,  0.+0.j,  1.+0.j, -0.-0.j,  0.-0.j,  0.+0.j, -0.+0.j, 0.+0.j]])
>>>

You can see from above that the left channel (row 0) has a '1' in bin 1 and the right channel (row 1) has a '1' in bin 2, which is what we'd expect. If you want your frequency data to be in column format, of course you can transpose that. And if you want just the real components, you can do that at the same time:

>>> f.transpose().real
array([[ 0., -0.],
       [ 1.,  0.],
       [ 0.,  1.],
       [-0., -0.],
       [ 0.,  0.],
       [-0.,  0.],
       [ 0., -0.],
       [ 0.,  0.]])

To prove that this is a proper transform of our original stereo data, compare this to s (above):

>>> np.fft.irfft(f).transpose().real
array([[ 0.14285714,  0.14285714],
       [ 0.12870984,  0.08906997],
       [ 0.08906997, -0.0317887 ],
       [ 0.0317887 , -0.12870984],
       [-0.0317887 , -0.12870984],
       [-0.08906997, -0.0317887 ],
       [-0.12870984,  0.08906997],
       [-0.14285714,  0.14285714],
       [-0.12870984,  0.08906997],
       [-0.08906997, -0.0317887 ],
       [-0.0317887 , -0.12870984],
       [ 0.0317887 , -0.12870984],
       [ 0.08906997, -0.0317887 ],
       [ 0.12870984,  0.08906997]])