Search code examples
pythonnumpytype-conversionpydubaudiosegment

How can I convert an AudioSegment to a NumPy array and back?


As the title states, I have had difficulty converting a PyDub AudioSegment to a NumPy array and back. I am aware of how to convert a PyDub AudioSegment to a NumPy array, and have a hazy idea of how to convert a NumPy array to a PyDub AudioSegment, but the methods I have learned of are varied and do not pair with eachother. So, how could I reliably get an AudioSegment to an array and back?

This is the code I used to get the array:

audio= AudioSegment.from_file("/file/path/sillysong.wav")
data = audio.get_array_of_samples()
data = np.array(data)
data = data.reshape(audio.channels, -1, order='F')
data

I do not know how to get the array in this form back. For context, I am using TensorFlow and I need the data to be in array form. Thank you for your help! (I'm a new coder so there's probably something obvious I'm missing.)


Solution

  • Your approach is correct. I have an example of LowRider.wav and I read it using pydub:

    from pydub import AudioSegment
    %matplotlib notebook
    import matplotlib.pyplot as plt
    import numpy as np
    
    audio = AudioSegment.from_file("LowRider.wav")
    data = np.array(audio.get_array_of_samples())
    data = data.reshape(audio.channels, -1, order='F')
    print("Shape of the converted numpy array:", data.shape)
    
    frame_rate = audio.frame_rate
    time_vector = np.linspace(0, len(data[0,:])/frame_rate, num=len(data[0,:]))
    
    plt.figure()
    plt.plot(time_vector, data[0,:], "-",  label = "Channel 1")
    plt.plot(time_vector, data[1,:], "--", label = "Channel 2")
    plt.legend()
    plt.xlabel("Time (s)")
    plt.ylabel("Signal")
    plt.show()
    

    This gives you data, which has the data from the two channels. Here is the plot of the two:Audio

    To convert back to .wav, use the following code, I included an export for you to test if the conversion happened successfully:

    reshaped_data = data.reshape(-1, order='F')
    
    new_audio = AudioSegment(
        reshaped_data.tobytes(),
        frame_rate=audio.frame_rate,
        sample_width=reshaped_data.dtype.itemsize,
        channels=audio.channels
    )
    
    new_audio.export("LowRider_Exported.wav", format="wav")
    

    Change the name of the file and let me know if it works :D