Is there a simple way to record a few seconds of sound and convert it to frequency? I have a USB mic and a raspberry pi 2 B.
In the file posted (convert2note.py) I am wondering how to make f equal to frequency obtained from the mic. This is what the program looks like so far
#d=69+12*log(2)*(f/440)
#d is midi, f is frequency
import math
f=raw_input("Type the frequency to be converted to midi: ")
d=69+(12*math.log(float(f)/440))/(math.log(2))
d=round(int(d))
notes = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
print notes[d % len(notes)]
Thanks a ton in advance :D
For capturing audio, you could for example use the sox
program. See the linked documentation for details, but it could be as simple as:
rec input.wav
But the following is used to make the file match the format expected by the code below;
rec −c 2 −b 16 −e signed-integer -r 44100 input.wav
(Technically only the -c
, -b
and -e
options are necessary to match the code below. You could reduce the sample rate -r
to speed up the processing)
For processing the audio in Python, it would be best to save it in a wav
file, since Python has a module for reading those in the standard library.
For converting the audio to frequencies we'll use the discrete Fourier transform in the form of Numpy's fast Fourier transform for real input. See the code fragment below, where I'm also using matplotlib to make plots.
The code below assumes a 2-channel (stereo) 16-bit WAV file.
from __future__ import print_function, division
import wave
import numpy as np
import matplotlib.pyplot as plt
wr = wave.open('input.wav', 'r')
sz = wr.getframerate()
q = 5 # time window to analyze in seconds
c = 12 # number of time windows to process
sf = 1.5 # signal scale factor
for num in range(c):
print('Processing from {} to {} s'.format(num*q, (num+1)*q))
avgf = np.zeros(int(sz/2+1))
snd = np.array([])
# The sound signal for q seconds is concatenated. The fft over that
# period is averaged to average out noise.
for j in range(q):
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2]*sf, da[1::2]*sf
lf, rf = abs(np.fft.rfft(left)), abs(np.fft.rfft(right))
snd = np.concatenate((snd, (left+right)/2))
avgf += (lf+rf)/2
avgf /= q
# Plot both the signal and frequencies.
plt.figure(1)
a = plt.subplot(211) # signal
r = 2**16/2
a.set_ylim([-r, r])
a.set_xlabel('time [s]')
a.set_ylabel('signal [-]')
x = np.arange(44100*q)/44100
plt.plot(x, snd)
b = plt.subplot(212) # frequencies
b.set_xscale('log')
b.set_xlabel('frequency [Hz]')
b.set_ylabel('|amplitude|')
plt.plot(abs(avgf))
plt.savefig('simple{:02d}.png'.format(num))
plt.clf()
The avgf
array now holds the average of the left and right frequencies. The plots look like this;
As you can see, a sound signal generally holds many frequencies.