I am trying to write an audio fingerprinting library for educational purpose. Its based on Computer Vision for Music Identification . I have a couple of questions relating to the contents of the paper.
I know that two bytes represents a sample, so i wrote this class to extract the samples from a pcm file. I'd like to know if this is right (sorry if its too obvious :) ).
class FingerPrint:
def __init__(self, pcmFile):
self.pcmFile = pcmFile
self.samples = []
self.init()
def init(self):
# Current samples
currentSamples = []
# Read pcm file
with open(self.pcmFile, 'rb') as f:
byte = f.read(2)
while byte != '':
self.samples.append(byte)
byte = f.read(2)
fp = FingerPrint('output.pcm')
If the above code is ok, then according to the book the next thing to do is to convolve the signal with a low pass filter and take every 8th sample. I don't understand these and why this has to be done, it would be awesome if someone could help me understand (with codes if possible)
After read the two bytes, you need to convert it into int. You can use struct module.
But I think you should use NumPy, SciPy:
To read wave file, you can call scipy.io.wavfile.read()
http://docs.scipy.org/doc/scipy-0.10.0/reference/tutorial/io.html#module-scipy.io.wavfile
If your file is raw PCM data, you can call numpy.fromfile()
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html
for example:
data = numpy.fromfile("test.pcm", dtype=np.int16)
To design lowpass filter, you can use filter design functions in scipy.signal:
http://docs.scipy.org/doc/scipy-0.10.1/reference/signal.html#filter-design
To do the convolve, you can use convoliving functions in scipy.signal:
http://docs.scipy.org/doc/scipy-0.10.1/reference/signal.html#convolution
There is also a convolve function in numpy:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html