I have built a little program that encodes binary data into a sound. For example the following binary input:
00101101
will produce a 'sound' like this:
################..S.SS.S################
where each character represents a constant unit of time. #
stands for a 880 Hertz sine wave which is used to determine start and end of transmission, .
stands for silence, representing the zeroes, and S
stands for a 440 Hertz sine wave, representing the ones. Obviously, the part in the middle is much longer in practice.
The essence of my question is: How can I invert this operation?
The sound file is transmitted to the recipient via simple playback and recording of the sound. That means I am not trying to decode the original sound file which would be easy.
Obviously I have to analyze the recorded data with respect to frequency. But how? I have read a bit about Fourier Transform but I am quite lost here.
I am not sure where to start but I know that this is not trivial and probably requires quite some knowledge about signal processing. Can somebody point me in the right direction?
BTW: I am doing this in Ruby (I know, it's slow - it's just a proof of concept) but the problem itself is not programming language specific so any answers are very welcome.
Your problem is clearly trying to demodulate an FSK modulated signal. I would recommend implementing a correlation bank tuned to each frequency, it is a lot faster than fft if speed is one of your concerns