Search code examples
pythonaudiowindow

Windowing an audio signal in Python for a gammatone filterbank implementation


I am new to programming, particularly to python. I am trying to implement an auditory model using 4th order gammatone filters. I need to break down a signal into 39 channels. When I used a smaller signal (about 884726 bits), the code runs well but I think the buffers are full, so I have to restart the shell to run the code second time. Tried using flush() but didn't work out. So, I decided to window the signal using a Hanning window but couldn't succeed in it either. To be very clear, I need to break an audio signal into 39 channels, rectify it (half wave) and then pass it into a second bank of 4th order filters, this time into 10 channels. I am trying to downsample the signal before sending into the second bank of filters. This is the piece of code that implements the filter bank by using the coefficients generated by another function. The dimensions of b are 39x4096.

def filterbank_application(input, b, verbose = False):
"""
A function to run the input through a bandpass filter bank with parameters defined by the b and a coefficients.

Parameters:
    * input (type: array-like matrix of floats) - input signal. (Required)
    * b (type: array-like matrix of floats) - the b coefficients of each filter in shape b[numOfFilters][numOfCoeffs]. (Required)

Returns:
* y (type: numpy array of floats) - an array with inner dimensions equal to that of the input and outer dimension equal to
    the length of fc (i.e. the number of bandpass filters in the bank) containing the outputs to each filter. The output 
    signal of the nth filter can be accessed using y[n].
"""

input = np.array(input)
bshape = np.shape(b)
nFilters = bshape[0]
lengthFilter = bshape[1]   

shape = (nFilters,) + (np.shape(input))
shape = np.array(shape[0:])
shape[-1] = shape[-1] + lengthFilter -1
y = np.zeros((shape))    

for i in range(nFilters):
    if(verbose):
        sys.stdout.write("\r" + str(int(np.round(100.0*i/nFilters))) + "% complete.")
        sys.stdout.flush()
    x = np.array(input)
    y[i] = signal.fftconvolve(x,b[i]) 
if(verbose): sys.stdout.write("\n")
return y

samplefreq,input = wavfile.read('sine_sweep.wav')


input = input.transpose()
input = (input[0] + input[1])/2

b_coeff1 = gammatone_filterbank(samplefreq, 39)
Output = filterbank_application(input, b_coeff1)

Rect_Output = half_rectification(Output)

I want to window audio into chunks of 20 seconds length. I would appreciate if you could let me know an efficient way of windowing my signal as the whole audio will be 6 times bigger than the signal I am using. Thanks in advance.


Solution

  • You may have a problem with the memory consumption, if you run a 32-bit Python. Your code consumes approximately 320 octets (bytes) per sample (40 buffers, 8 octets per sample). The maximum memory available is 2 GB, which means that then the absolute maximum size for the signal is around 6 million samples. If your file around 100 seconds, then you may start having problems.

    There are two ways out of that problem (if that really is the problem, but I cannot see any evident reason why your code would otherwise crash). Either get a 64-bit Python or rewrite your code to use memory in a more practical way.

    If I have understood your problem correctly, you want to:

    1. run the signal through 39 FIR filters (4096 points each)
    2. half-rectify the resulting signals
    3. downsample the resulting half-rectified signal
    4. filter each of the downsampled rectified signals by 10 FIR filters (or IIR?)

    This wil give you 39 x 10 signals which give you the attack and frequency response of the incoming auditory signal.

    How I would do this is:

    1. take the original signal and keep it in memory (if it does not fit, that can be fixed by a trick called memmap, but if your signal is not very long it will fit)
    2. take the first gammatone filter and run the convolution (scipy.signal.fftconvolve)
    3. run the half-wave rectification (sig = np.clip(sig, 0, None, out=sig))
    4. downsample the signal (e.g. scipy.signal.decimate)
    5. run the 10 filters (e.g. scipy.signal.fftconvolve)
    6. repeat steps 2-5 for all other gammatones

    This way you do not need to keep the 39 copies of the filtered signal in memory, if you only need the end results.

    Without seeing the complete application and knowing more about the environment it is difficult to say whether you really have a memory problem.


    Just a stupid signal-processing question: Why half-wave rectification? Why not full-wave rectification: sig = np.abs(sig)? The low-pass filtering is easier with full-wave rectified signal, and the audio signals should anyway be rather symmetric.


    There are a few things which you might want to change in your code:

    • you convert input into an array as the first thing in your function - there is no need to do it again within the loop (just use input instead of x when running the fftconvolve)

    • creating an empty ycould be done by y = np.empty((b.shape[0], input.shape[0] + b.shape[1] - 1)); this will be more readable and gets rid of a number of unnecessary variables

    • input.transpose() takes some time and memory and is not required. You may instead do: input = np.average(input, axis=1) This will average every row in the array (i.e. average the channels).

    There is nothing wrong with your sys.stdout.write, etc. There the flush is used because otherwise the text is written into a buffer and only shown on the screen when the buffer is full.