Search code examples
pythonnumpyaudiosignal-processingfft

Loss of Data in Short-Time Fourier Transform and Inverse: Need Help Improving Audio Quality


I am currently developing my own audio library and have implemented the Short-Time Fourier Transform (STFT) and its inverse as part of the signal processing pipeline. However, I have noticed that the STFT and its inverse operations seem to be causing a significant loss of data, resulting in very poor audio quality.

First I divide the audio signal into overlapping frames, and for each frame, I apply a window function to reduce spectral leakage. Then, I perform the Fourier Transform on each frame to obtain the frequency-domain representation.

Then as a test, I perform the inverse to get the audio back. I have plotted these results below and you can see the degradation.

Spectral output

Here are the functions:

class AudioLib:
    '''Library of audio processing functions.'''
    def __init__(self, blocksize=1024 * 2):
        self.blocksize = blocksize
        self.window = np.hanning(blocksize)

    def stft(self, audio):
        '''Compute the short-time Fourier transform of the audio.'''
        # Split the audio into overlapping blocks
        num_blocks = len(audio) // self.blocksize
        blocks = np.reshape(audio[:num_blocks * self.blocksize], (num_blocks, self.blocksize))

        # Apply the windowing function to each block
        windowed_blocks = blocks * self.window[np.newaxis, :]

        # Compute the Fourier transform of each block
        spectrum = np.fft.fft(windowed_blocks, axis=1)

        return spectrum
    
    def istft(self, spectrum):
        '''Compute the inverse short-time Fourier transform of the spectrum.'''
        # Compute the inverse Fourier transform of each block
        windowed_blocks = np.fft.ifft(spectrum, axis=1).real

        # Apply overlap-and-add to reconstruct the output signal
        output = np.zeros(len(spectrum) * self.blocksize)
        for i, block in enumerate(windowed_blocks):
            output[i * self.blocksize : (i + 1) * self.blocksize] += block

        return output

I have tried changing the block size and while reducing the size improves the audio somewhat, it is still not perfect and I feel as though my implementation is incorrect. Any help regarding this would be greatly appreciated!


Solution

  • Like Christoph Rackwitz said, the problem with this STFT implementation is that the blocks are non-overlapping. For invertibility, you want that each block has 50% overlap with the next block.

    Here is a possible simple implementation for extracting and overlap-adding the blocks:

    # Copyright 2023 Google LLC.
    # SPDX-License-Identifier: Apache-2.0
    
    def extract_blocks(audio: np.ndarray, blocksize: int) -> np.ndarray:
      """Extracts blocks with 50% overlap."""
      hop_step = blocksize // 2
      blocks = []
      offset = 0
    
      while offset + blocksize <= len(audio):
        blocks.append(audio[offset:(offset + blocksize)])
        offset += hop_step
    
      return np.column_stack(blocks)
    
    
    def overlap_add_blocks(blocks: np.ndarray) -> np.ndarray:
      """Overlap-adds blocks with 50% overlap."""
      blocksize, num_blocks = blocks.shape
      hop_step = blocksize // 2
      output = np.zeros(blocksize + (num_blocks - 1) * hop_step)
      offset = 0
    
      for i in range(num_blocks):
        output[offset:(offset + blocksize)] += blocks[:, i]
        offset += hop_step
      
      return output
    

    If you want do to this more efficiently, check out numpy stride_tricks.

    Another detail: for accurate invertibility, the window should be such that adding 50%-overlapped translated copies produces 1.0. To do this, np.hanning needs a small correction. Change self.window = np.hanning(blocksize) to

    self.window = np.hanning(blocksize + 1)[:blocksize]
    

    This plot shows the difference for blocksize = 32. The thick line is the sum of the windows. Notice that thick line is wiggly for np.hanning(blocksize) but perfectly flat and equal to 1.0 for np.hanning(blocksize + 1)[:blocksize].

    Overlap-added windows.