Search code examples
pythonspeech-recognitionspeechsapi

Microsoft Speech Recognition Custom Training


I have been wanting to create an application using the Microsoft Speech Recognition.

My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return

You said: At age C.

You said: Cern

While it did work for 'CERN', it failed very badly for 'LHC'.

However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user access the Speech Control Panel and run my training file.

All the links I have found for this have been frustratingly useless, as they just say things like 'This is ----, you should try going to the ---- forum instead'.

If it does help, here is a list of the links:

http://compgroups.net/comp.speech.users/add-my-own-training/153194

https://groups.google.com/forum/#!topic/microsoft.public.speech.server/v58SH1ov22s

http://social.msdn.microsoft.com/Forums/en/servercorefordevelopers/thread/f7a35f3f-b352-464a-b264-e16eb4afd049

Is my problem even possible? Or are the training files themselves in a special format? If so, can that format be reproduced?

A solution that can also work on Windows XP would be ideal.

Thanks in advance!

P.S. If there are any libraries or modules out there already for this, could anyone point me to some? A Python or C/C++ solution would be splendid. Also, since I'd rather not post another question regarding this, is it possible to utilize the train utilities from command prompt (or without the GUI visible, but still having total command of all controls)?


Solution

  • Okay, pulling this from a thing I wrote three or four years ago now, but I believe you want to do something like this.

    The grammar library is a trained system which can recognize words. You can create your own grammar library cued to specific words.

    C#, sorry

    using System.Speech
    using System.Speech.Recognition
    using System.Speech.AudioFormat
    
    SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
    
    string[] words = {"L H C", "CERN"};
    Choices choices = new Choices(words);
    GrammarBuilder gb = new GrammarBuilder(choices);
    Grammar grammar = new Grammar(gb);
    sre.LoadGrammar(grammar);
    

    That is as far as I can get you. From docs it looks like you can define the pronunciations somehow. So perhaps that way you could have LHC map directly to a single word. Here are the docs on the grammar class - http://msdn.microsoft.com/en-us/library/system.speech.recognition.grammar.aspx

    Small update - see example in their docs here http://msdn.microsoft.com/en-us/library/ms554228.aspx