In our ongoing experiments with transcribing video materials using various speech-to-text suppliers, Microsoft seems to be a strong contender regarding the actual word recognition. For English materials, the formatting/punctuation is also quite good, but for Norwegian language materials (which is most relevant to us) there is hardly any formatting/punctuation whatsoever. We're using the C# API SpeechRecognizer, with config.SpeechRecognitionLanguage set to "nb-NO", config.OutputFormat set to OutputFormat.Detailed, and using config.RequestWordLevelTimestamps(). Is there something we can do to improve the formatting of the results?
Also, when retrieving single words w/timestamps (which is one of our requirements), there is no formatting even with English materials. Is there some option we can set to maintain formatting/punctuation when retrieving single words?
Best regards, Gunnar
Microsoft speech formatting support for nb-NO results is indeed very basic at this moment. Display results have basic number formatting and explicit punctuation when requested. Microsoft speech is actively working on improving automatic punctuation and capitalization to improve the results. Regarding to timestamps, currently it does not produce timestamps for the display level. It may be supported in the future.