Detecting noise via mic while playing a song on iPhone

I am making an app that should play a simple audio track and let me know if there is any noise in the vicinity while the track is playing. This is done by doing a live recording from the microphone while the song plays on the iPhone's loudspeaker. Any sound that is not part of the music playing is defined as noise.

What would be the simplest way to implement this functionality?

I've researched it quite extensively online but I have not been able to find anything that points me to a solution for this specific problem. Although it may be a combination of different techniques I read about that eventually will be the solution.

Things I've already implemented
Playing the song and recording audio simultaneously.

Things I've tried
NOTE: Since we're encouraged to add what we've already tried I add the following part. But I'm by no means saying that this is the right way to solve the problem, it's just something I tried.

I hacked the aurioTouch2 sample application: what I did was playback the song once and record the fast fourier transform values (at a pretty low sample rate to keep the amount of recorded data low). Then when the track was played again I would basically calculate (per timestep) the correlation coefficient between the output graphs that are constructed using the live playback fft data and the recorded fft data (so the 'squiggly' lines that you see when you put the app in fft mode).
This 'sort of' works. The correlation coefficient is clearly lower when excess sound/noise is in the room but it isn't very sensitive and also depends on the volume level that was used when recording the fft data. In the end I'm thinking that this may not be the best approach.

Does anyone think this is possible? If so, what would be the best approach?
Please ask if you need more clarification!

Solution

In the end we decided not to do this in the app. I got a demo working where I would first do a calibration for the song, collecting a set of the most dominant frequencies, do the same for the ambient room noise and use those frequencies in the decision process while the song is playing. It worked alright, although I felt it still needed a lot of tweaking. It was the best I could do with my limited knowledge of audio related programming :)