Search code examples
pythonaudiowavpyaudiospectrogram

Converting microphone data to frequency spectrum


I'm trying to create a spectrogram program (in python), which will analyze and display the frequency spectrum from a microphone input in real time. I am using a template program for recording audio from here: http://people.csail.mit.edu/hubert/pyaudio/#examples (recording example)

This template program works fine, but I am unsure of the format of the data that is being returned from the data = stream.read(CHUNK) line. I have done some research on the .wav format, which is used in this program, but I cannot find the meaning of the actual data bytes themselves, just definitions for the metadata in the .wav file.

I understand this program uses 16 bit samples, and the 'chunks' are stored in python strings. I was hoping somebody could help me understand exactly what the data in each sample represents. Even just a link to a source for this information would be helpful. I tried googling, but I don't think I know the terminology well enough to search accurately.


Solution

  • stream.read gives you binary data. To get the decimal audio samples, you can use numpy.fromstring to turn it into a numpy array or you use Python's built-in struct.unpack.

    Example:

    import pyaudio
    import numpy
    import struct
    
    CHUNK = 128
    
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=CHUNK)
    
    data = stream.read(CHUNK)
    print numpy.fromstring(data, numpy.int16) # use external numpy module
    print struct.unpack('h'*CHUNK, data) # use built-in struct module
    
    stream.stop_stream()
    stream.close()
    p.terminate()