We use the standard method of recording audio in Unity:
_sendingClip = Microphone.Start(_device, true, 10, 16000);
where _sendingClip
is the AudioClip and _device
is the device name.
I'd like to know when the user stops speaking, which can happen after 2 seconds, or even 10.
I've looked at different sources to find an answer, but could not find one:
The idea is that when a user stops talking, the audio is send to a speech recognition server without a delay and without audio getting cut off when the user is still speaking.
Solutions don't need to be in code format. A general direction of where to look would be nice.
You can send the recording audioclip to an AudioSource and play it using:
audioSource.clip = Microphone.Start(_device, true, 60, 16000);
while (!(Microphone.GetPosition(null) > 0)) { }
When it is playing, you can get the SpectrumData from the audio. When the user is speaking the spectrumdata will show more peaks. You can check the average of the SpectrumData audio to determine if someone is speaking. You should set some sort of minimum level, as you will probably have some noise in the recordings. If the average of the spectrumdata is above the determined level, someone is speaking, if it's below that, the user stopped speaking.
float[] clipSampleData = new float[1024];
bool isSpeaking=false;
void Update(){
audioSource.GetSpectrumData(clipSampleData, 0, FFTWindow.Rectangular);
float currentAverageVolume = clipSampleData.Average();
else if(isSpeaking){
//volume below level, but user was speaking before. So user stopped speaking
You can put that check in the Update method, the spectrumdata will be the spectrumdata of the last frame. So it will be close to realtime.
The minimum level can be determined by just recording something silent, you can do that before the user needs to speak, or in a set-up kind of way.
With this solution the user will hear itself speak, you can set the output of the audiosource to the audiomixer, and put that volume to -80. So it will still recognize the data, but doesn't output the sound to the user. Setting the volume to 0 on the audioSource will give 0 spectrumdata, so use the audiomixer in that case.