Search code examples
pythonnumpyscipywav

how to read multiple wav files in python, and convert to numpy arrays to plot


I need to read multiple wave files named as chunk1.wav, chunk2.wav... in my project directory and convert them into numpy arrays to plot. I am able to do this for a single wav file, convert it to numpy and plot it using matplotlib, but am not able to do it for an array of wav files.

I searched all over on how to import an array of wav files with the .read() function from the scipy library. I tried using an array of strings, but the read() function does not "understand" a variable as a parameter let alone a string array. Any advise on how i can achieve this multiple wav read?

import pyaudio
import wave
from matplotlib import pyplot as plt
import numpy as np
from pydub import AudioSegment
from pydub.silence import split_on_silence
from scipy.io.wavfile import read

no_of_files = 15
file_name = []

for i in range(0, no_of_files):
    file_name.append("chunk{0}.wav".format(i))

a = read(file_name[5]) #test to see if it works for one file
a = np.array(a[1],dtype=float)
plt.plot(a)
plt.show()

the error i get:

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/io/wavfile.py", line 168, in _read_riff_chunk "understood.".format(repr(str1))) ValueError: File format ''... not understood.


Solution

  • Looking at the scipy internals this is caused by the file signature not being understood, from the error message you get it looks like the file signature is missing (''), or there is some other issue reading data from the file:

    def _read_riff_chunk(fid):
        str1 = fid.read(4)  # File signature
        if str1 == b'RIFF':
            is_big_endian = False
            fmt = '<I'
        elif str1 == b'RIFX':
            is_big_endian = True
            fmt = '>I'
        else:
            # There are also .wav files with "FFIR" or "XFIR" signatures?
            raise ValueError("File format {}... not "
    "understood.".format(repr(str1)))
    

    I can't see any similar limitations at a glance using the python wave library so potentially try reading in using that and convert the data to a numpy array afterwards