Search code examples
pythondecodewavdecodingdecoder

How to get a list of frequencies in a wav file


I'm trying to decode some audio which is basically two frequencies (200hz for a 0 and 800hz for 1) that directly translates directly to binary. A sample of the audio

This sample translates to "1001011". There is a third frequency that is 1600hz as a dividor between the bits.

I can't find anything that works i did find a few things but it either was outdated or just straight up not working i'm really despaired.

I made a sample code that can generate audio for this encoding (to test the decoder):

import math
import wave
import struct

audio = []
sample_rate = 44100.0

def split(word):
    return [char for char in word]

def append_sinewave(
        freq=440.0,
        duration_milliseconds=10,
        volume=1.0):
    global audio
    num_samples = duration_milliseconds * (sample_rate / 1000.0)
    for x in range(int(num_samples)):
        audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
    return
def save_wav(file_name):
    wav_file=wave.open(file_name,"w")
    nchannels = 1
    sampwidth = 2
    nframes = len(audio)
    comptype = "NONE"
    compname = "not compressed"
    wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
    for sample in audio:
        wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
    wav_file.close()
    return
print("Input data!\n(binary)")
data=input(">> ")
dataL = []
dataL = split(data)
for x in dataL:
    if x == "0":
        append_sinewave(freq=200)
    elif x == "1":
        append_sinewave(freq=800)
    append_sinewave(freq=1600,duration_milliseconds=5)
    print("Making "+str(x)+" beep")


print("\nWriting to file this may take a while!")
save_wav("output.wav")

Thanks for helping in advance!


Solution

  • I think I understand what you are attempting. From your encoder script I assume that each bit translates to 10 milliseconds in your wave file, with a 5ms 1600hz tone as a kind of delimiter. If these durations are fixed, you could simply use scipy and numpy to segment the audio and decode each segment.

    Given your encoder script above to generate a 105ms (7 * 15ms) mono output.wav for the bytestring: 1001011 and if the delimiting frequencies are to be ignored, we should aim to return a list representing the frequencies for each bit:

    [800, 200, 200, 800, 200, 800, 800]
    

    We can read in the audio using scipy and perform the FFT on segments of the audio using numpy to get the frequencies of each segment:

    from scipy.io import wavfile as wav
    
    import numpy as np
    
    rate, data = wav.read('./output.wav')
    
    # 15ms chunk includes delimiting 5ms 1600hz tone
    duration = 0.015
    
    # calculate the length of our chunk in the np.array using sample rate
    chunk = int(rate * duration)
    
    # length of delimiting 1600hz tone
    offset = int(rate * 0.005)
    
    # number of bits in the audio data to decode
    bits = int(len(data) / chunk)
    
    def get_freq(bit):
        # start position of the current bit
        strt = (chunk * bit) 
        
        # remove the delimiting 1600hz tone
        end = (strt + chunk) - offset
        
        # slice the array for each bit
        sliced = data[strt:end]
    
        w = np.fft.fft(sliced)
        freqs = np.fft.fftfreq(len(w))
    
        # Find the peak in the coefficients
        idx = np.argmax(np.abs(w))
        freq = freqs[idx]
        freq_in_hertz = abs(freq * rate)
        return freq_in_hertz
    
    decoded_freqs = [get_freq(bit) for bit in range(bits)]
    

    yields

    [800.0, 200.0, 200.0, 800.0, 200.0, 800.0, 800.0]
    

    To convert to bits/bytes:

    bitsarr = [1 if freq == 800 else 0 for freq in decoded_freqs]
    
    byte_array = bytearray(bitsarr)
    decoded = bytes(a_byte_array)
    print(decoded, type(decoded))
    

    yields

    b'\x01\x00\x00\x01\x00\x01\x01' <class 'bytes'>
    

    Further information about deriving the peak frequency see this question