Search code examples
androidtext-to-speechvb4android

Android TTS in-text controls? Are they available or any equivalent technique?


I am trying to port a TTS app that utilizes in-text control tags from desktop/web/iOS to Android. The app makes a text file consisting of the text to be spoken and silent periods between the spoken words. Silent periods are represented with in-text control tags such as SAPI TTS <silence msec="1000"/> tag or iOS TTS engine in-text control tag for silence [[slnc 10000]]

The text sent to the SAPI TTS speech synthesizer looks like this:

Text one <silence msec="750"/> text two <silence msec="1000"/> text three <silence msec="500"/> Text four <silence msec="600"/> Text five.....

Similarly for iOS TTS the in-text control tag for silence is [[slnc 10000]] and the text to be sent to the speech synthesizer looks like this:

Text one [[slnc 750]] text two [[slnc 10000]] text three [[slnc 500]] text four [[slnc 600]] text five......

Android TTS doesn't seem to use in-text control tags for the speech synthesizer. Also the following two variants of the speech() method use google web service so to achieve accurate timing of the spoken text coming back from the speech synthesizer server and the timing of the silence periods within the code may be impossible or unreliable at best.

speak(speech, TextToSpeech.QUEUE_FLUSH, null);

speak(speech, TextToSpeech.ADD_ADD, null);

I welcome any Android solution that focuses on preserving accurate timing of silence periods between spoken words.


Solution

  • The Android TTS engine has the deprecated playSilence() and the newer playSilentUtterance() methods that can be used to pause the speech output for a given amount of time.

    If the app targets API level 21 i.e. Android 5.0 as the minimum, then playSilentUtterance() should be used. Otherwise the deprecated playSilence() is still available.

    The complete method signature of the playSilentUtterance method is:

    int playSilentUtterance (long durationInMs, int queueMode, String utteranceId)
    

    Here durationInMs is the duration of the silence in milliseconds.

    The queueMode can be either QUEUE_ADD which means that the silence is played after the TTS engine has finished what it is currently speaking and what was already added to the queue and QUEUE_FLUSH stops everything first and clears the queue, so the silence is played right away.

    Finally the utteranceId is an optional unique identifier for the text (or in this case silence) to be spoken and is useful if using an UtteranceProgressListener.