ruby audio signal-processing frequency-analysis frequency-domain

How do I interpret audio encoded binary data?

I have built a little program that encodes binary data into a sound. For example the following binary input:

00101101

will produce a 'sound' like this:

################..S.SS.S################

where each character represents a constant unit of time. # stands for a 880 Hertz sine wave which is used to determine start and end of transmission, . stands for silence, representing the zeroes, and S stands for a 440 Hertz sine wave, representing the ones. Obviously, the part in the middle is much longer in practice.

The essence of my question is: How can I invert this operation?

The sound file is transmitted to the recipient via simple playback and recording of the sound. That means I am not trying to decode the original sound file which would be easy.

Obviously I have to analyze the recorded data with respect to frequency. But how? I have read a bit about Fourier Transform but I am quite lost here.

I am not sure where to start but I know that this is not trivial and probably requires quite some knowledge about signal processing. Can somebody point me in the right direction?

BTW: I am doing this in Ruby (I know, it's slow - it's just a proof of concept) but the problem itself is not programming language specific so any answers are very welcome.

Solution

Your problem is clearly trying to demodulate an FSK modulated signal. I would recommend implementing a correlation bank tuned to each frequency, it is a lot faster than fft if speed is one of your concerns