Search code examples
linuxspeech-recognitionspeechraspberry-pihuman-computer-interface

Internetless vocal trigger recognition


Speech recognition on handheld devices is usually triggered by a press of a button. How do I go about triggering speech recognition without that? My Raspberry PI based device intentionally does not have anything users are able to interact with manually - there is only a microphone hanging out the wall.

I am trying to implement a way to have it understand a simple trigger command that would initiate a sequence of actions. In short, I want to run a single .sh script whenever it "hears" an audio trigger. I don't want it to understand anything else but just a trigger - there is no meaning that it has to decode from the trigger itself - like the name of the script or parameters. A very simple function - "hear the trigger -> execute .sh script"

I've explored different options:

  1. Getting audio stream continuously sent to google speech recognition service - Not a very good idea - too much wasted traffic and resources

  2. Getting an internetless speech recognition application to continuously listen to the audio stream and "pick out" the trigger words - that's a bit better yet pretty much a waste of resources and these systems have to be taught audio samples - this pretty much removes ability to quickly set custom names to devices

  3. Use some sort of pitch processing to have it react to a sequence of loud sounds - hands clapped two times or something similar - not too bad but I guess my hands will fall off after I get the thing properly tested or I will get killed by my family member since I normally get to experiment with my toys at night when they are in beds.

  4. Whistle recognition - not much different from previous option but your palms don't sore and chances are I survive the testing if I learn to whistle quietly. I was able to find an article by IBM on commanding a computer via whistle commands - the approach is pretty much the same to local speech recognition applications but you teach it to understand the different whistle sequences. However, from that I did not understand how I could teach it to understand just any whistle regardless of its tone.

I sort of like the whistle idea - it does seem that it should be the least resources hungry amongst other options - how can I do this?

Are there other vocal triggers that could be easily implemented given I am limited by Raspberry PI hardware?


Solution

  • Take a look on nodejs process which handles audio stream events from microphone and then uses pocketsphinx offline voice recognition with limited custom dictionary to recognize simple voice commands:

    https://github.com/ybutb/yee-voice