Search code examples
androidtext-to-speechgoogle-text-to-speech

SpeechToText synthesizeToFile not queuing


I'm trying to write some speech to an mp3 using the TextToSpeech.synthesizeToFile method. The text is longer than 4000 characters so I've had to break it up into chunks. The problem is only the last chunk of text ends up as the audio file, the method appears to be overwriting the file and not adding to it.

int maxLength = 1000; // TextToSpeech.getMaxSpeechInputLength() - 1; smaller chunks = start streaming sooner?
String fileName = Environment.getExternalStorageDirectory().getAbsolutePath() + "/test_p.mp3";
File file = new File(fileName);
final String utteranceId = "fooo";
for (int i = 0; i < elementListJson.length(); i++) {
    JSONArray elArray = elementListJson.getJSONArray(i);
    nodeType = elArray.get(0).toString();
    nodeText = elArray.get(1).toString();
    ArrayList<String> chunkedNodeText = StringHelper.splitToLineLengthWithoutBreakingWords(nodeText, maxLength);
    for(String chunk : chunkedNodeText) {
        Log.d("TTSW", "p"+partsProcessed+"   = "+chunk);
        int speechStatus = textToSpeech.synthesizeToFile(
                chunk
                , null, file, utteranceId);
        if (speechStatus == TextToSpeech.ERROR) {
            showNotificationHint("Error converting text to speech! #1");
        }
    }
}

This code works with the speak() method and QUEUE param but I need to write it to a single file to allow the user pause/rewind/fast forward controls. I couldn't find a param for telling synthesizeToFile to queue but according to the doccomment it should queue up work anyway.


Solution

  • I gave this a try but it was a bit of a pandoras box!

    It appears that there is no way to "append" a new utterance to an existing file using synthesizeToFile() -- it will always rewrite the file.

    So it appears that the only ways to accomplish what you're trying to do would be:

    A) to write all the separate files (file1, file2, file3, etc.) and then merge them in order after* they are all finished being created.

    B) use a loop to write to a "temporary" file for each successive utterance, merging the temp file to the "main" file with each iteration. However, since sythesizeToFile() is asynchronous, you would need to pause/control the flow of this loop using an UtteranceProgressListener in order to prevent to temp file from being prematurely overwritten.

    Other considerations:

    1) synthesizeToFile() will only produce WAV even if you are naming it as mp3.

    2) * - synthesizeToFile() is asynchronous, so you will inevitably need to use an UtteranceProgressListener to prevent premature overwrites.

    3) WAV files are large so you would need to clean them up.