I have a .wav
file that has 2 types of sounds: Long and short. What I need to do is I need to encode them as bits and write them to a binary file.
I got the code from this SO answer: https://stackoverflow.com/a/53309191/2588339 and using it I get this plot for my input wav file:
As you can see, there are shorter and wider parts in the first plot as for the shorter and longer sounds in my file.
My question is how can I encode each one of the sounds as a bit? Like having each long sound in the file represent a 1
and a short sound represent a 0
.
EDIT: The 2 types of sound differ by how long they play and by frequency also. The longer sound is also lower frequency and the shorter sound is also higher frequency. You can find a sample of the file here: https://vocaroo.com/i/s0A1weOF3I3f
Measuring the loudness of each frequency by taking the FFT of the signal is the more "scientific" way to do it, but the image of the raw signal indicates it should be possible to get away much easier than that.
If you take a sliding window (at least as wide as 1 period of the primary frequency of the sound (~300Hz)) and find the maximum value within that window, it should be fairly easy to apply a threshold to determine if the tone is playing at a given time interval or not. Here's a quick article I found on rolling window functions.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
window_size = sample_rate / primary_freq #minimum size window. could be larger.
rolling_max = np.max(rolling_window(wav_data, window_size),-1)
threshold_max = rolling_max > threshold # maybe about 1000ish based on your graph
Then simply determine the length of the runs of True
in threshold_max
. Again, I'll pull on the community from this answer showing a concise way to get the run length of an array (or other iterable).
def runs_of_ones(bits):
for bit, group in itertools.groupby(bits):
if bit: yield sum(group)
run_lengths = list(runs_of_ones(threshold_max))
The values in run_lengths
should now be the length of each "on" pulse of sound in # of samples. It should now be relatively straightforward for you to test each value if it's long or short and write to a file.