I'm trying to create a spectrogram program (in python), which will analyze and display the frequency spectrum from a microphone input in real time. I am using a template program for recording audio from here: http://people.csail.mit.edu/hubert/pyaudio/#examples (recording example)
This template program works fine, but I am unsure of the format of the data that is being returned from the data = stream.read(CHUNK)
line. I have done some research on the .wav format, which is used in this program, but I cannot find the meaning of the actual data bytes themselves, just definitions for the metadata in the .wav file.
I understand this program uses 16 bit samples, and the 'chunks' are stored in python strings. I was hoping somebody could help me understand exactly what the data in each sample represents. Even just a link to a source for this information would be helpful. I tried googling, but I don't think I know the terminology well enough to search accurately.
stream.read
gives you binary data. To get the decimal audio samples, you can use numpy.fromstring
to turn it into a numpy array or you use Python's built-in struct.unpack
.
Example:
import pyaudio
import numpy
import struct
CHUNK = 128
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=CHUNK)
data = stream.read(CHUNK)
print numpy.fromstring(data, numpy.int16) # use external numpy module
print struct.unpack('h'*CHUNK, data) # use built-in struct module
stream.stop_stream()
stream.close()
p.terminate()