I am working on Speech to text android application. Google API's are available for online and offline speech to text conversions.
I have done testing of speech to text on Google API's (online ans well as Offline API). It has been observed that online speech to text is giving better accuracy as compared to the offline. Now my questions are
What is the difference between online mode and offline mode? Why offline mode decreases its accuracy? Is there any solution with better accuracy?
The offline mode is a based on a model that has a file size of approx. 20.3MB; given that no internet connection is needed, no data is needed to be sent/received. Regardless, this model does speech-to-text about 6.5-7x faster than the online version. Of key mention here, is that this model has a word error rate of 13.5%, which although, not very high, is quite high given the limited data, and algorithms, it has access to.
An online system would obviously have access to way more training data, and get parsed through more algorithms. I don't think the offline version can be considered as a replacement, but as a substitution when the online version is not available. I have read articles where users have claimed that 'English US' works better than 'English UK', the reasons for which are not entirely known to me.
3G cannot give voice and data and the same time. WiFi/4G does not have this issue. There are multiple other known issues like constraints from service providers, LTE/non-LTE, CDMA, etc. If you have such a constraint, one way could be to incorporate some design changes to enable you to cache data and then access the online engine, after the call is completed.
In my limited experience, for offline functionality, CMUSphinx seems like a better bet (since Google is limited to 50 calls a day(?)). A few other available API's are listed here.
The research paper that enabled offline speech-to-text is linked here [link].