How to start recording when something is said Python?

I am trying to make a program which uses speech recognition. Now I have a problem which I am running into, this is that you have to press a button or Enter to make the speech recognition start. Is there a way in which you say a phrase (kind of like Hey Google) that it starts recognizing speech in Python 3?
This is my code:

Recording audio code:

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)
x = r.recognize_google(audio)

print("I'm listening!")

try:
    print("You said: " + r.recognize_google(audio))
except speech_recognition.UnknownValueError:
    print("I am sorry but I couldn't understand you, try again.")
except speech_recognition.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

Thanks in advance!

Solution

Yes, essentially you have to break down your recognition into two parts: Keyword recognition (solely listening for a keyword) and the main recognition (recognizing what the user said after the keyword). Do know that this means your program will always be listening.

For the keyword recognition, you can use Recognizer()'s listen_in_background method and scan for the keyword in whatever callback you give it. If the keyword is found, then you call Recognizer().listen(source).

Since listening to keywords will require your program to constantly be listening and recognizing, you do not want to use any of the speech recognition APIs that require an internet connection (Bing, Google, Watson, Houndify, etc...). This is because all of these have monthly API limits that you will easily burn through. You want to save these APIs for actual recognition. I believe your only offline options are to use recognize_sphinx or snowboy hotword detection. I've never actually used Snowboy (though I hear it's pretty good) because it doesn't work on Windows (or at least it didn't when I was writing my program), but Sphinx has a keyword detection tool of sorts.

Basically, you pass the sphinx_recognizer keywords and how sensitive it should be to picking up those keywords via a tuple, and it will try to focus on finding those words in the speech. Beware that the more sensitive you make the keyword, the more false positives you'll get.

Here is an example:

import speech_recognition as sr
import time

r = sr.Recognizer()

# Words that sphinx should listen closely for. 0-1 is the sensitivity
# of the wake word.
keywords = [("google", 1), ("hey google", 1), ]

source = sr.Microphone()


def callback(recognizer, audio):  # this is called from the background thread

    try:
        speech_as_text = recognizer.recognize_sphinx(audio, keyword_entries=keywords)
        print(speech_as_text)

        # Look for your "Ok Google" keyword in speech_as_text
        if "google" in speech_as_text or "hey google":
            recognize_main()

    except sr.UnknownValueError:
        print("Oops! Didn't catch that")


def recognize_main():
    print("Recognizing Main...")
    audio_data = r.listen(source)
    # interpret the user's words however you normally interpret them


def start_recognizer():
    r.listen_in_background(source, callback)
    time.sleep(1000000)


start_recognizer()

This link is really helpful when working with the speech_recognition library:

https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst