Search code examples
c++linuxtext-to-speechespeak

Espeak Functionality


i am trying to do some functionality with espeak but missing some parameters (i don`t know it) and working on code blocks on Linux the next code runs well and reads Arabic Text

  `#include<string.h> 
   #include<malloc.h> 
   #include</usr/local/include/espeak/speak_lib.h> 
   int main(int argc, char* argv[] ) 
{ 
char text[] = {"الله لطيف "}; 
espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0 );  
espeak_SetVoiceByName("ar"); 
unsigned int size = 0;  
while(text[size]!='\0') size++;
unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE; 
espeak_Synth( text, size+1, 0,POS_CHARACTER,0, flags, NULL, NULL ); 
espeak_Synchronize( ); 
return 0; 
 }`

now could you help us finding these parameters from Espeak
1.Fuction which return the generated wave to store it in a variable

2.Frequency

3.number of channels

4.sample size

5.a buffer in which we store samples

6.number of samples


Solution

  • If you can't find a suitable example, you will have to read the documentation in the header file. Haven't used it, but it looks pretty comprehensible:

    http://espeak.sourceforge.net/speak_lib.h

    When you called espeak_Initialize you passed in AUDIO_OUTPUT_PLAYBACK. You will need to pass in AUDIO_OUTPUT_RETRIEVAL instead, and then it looks like you must call espeak_SetSynthCallback with a function of your own creation to accept the samples.

    Your adapted code would look something like this (UNTESTED):

    #include <string.h>
    #include <vector> 
    #include </usr/local/include/espeak/speak_lib.h> 
    
    int samplerate; // determined by espeak, will be in Hertz (Hz)
    const int buflength = 200; // passed to espeak, in milliseconds (ms)
    
    std::vector<short> sounddata;
    
    int SynthCallback(short *wav, int numsamples, espeak_EVENT *events) {
        if (wav == NULL)
            return 1; // NULL means done.
    
        /* process your samples here, let's just gather them */
        sounddata.insert(sounddata.end(), wav, wav + numsamples);
        return 0; // 0 continues synthesis, 1 aborts 
    }
    
    int main(int argc, char* argv[] ) {
        char text[] = {"الله لطيف "};
        samplerate = espeak_Initialize(AUDIO_OUTPUT_RETRIEVAL, buflength, NULL, 0);  
        espeak_SetSynthCallback(&SynthCallback);
        espeak_SetVoiceByName("ar"); 
        unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE;
        size_t size = strlen(text); 
        espeak_Synth(text, size + 1, 0, POS_CHARACTER, 0, flags, NULL, NULL); 
        espeak_Synchronize();
    
        /* in theory sounddata holds your samples now... */
    
        return 0; 
    }
    

    So for your questions:

    1. Function which return the generated wave to store it in a variable - You write a callback function, and that function gets little buflength-long bits of the wav to process. If you are going to accumulate the data into a larger buffer, I've shown how you could do that yourself.

    2. Frequency - Through this API it doesn't look like you pick it, espeak does. It's in Hz and returned as samplerate above.

    3. Number of Channels - There's no mention of it, and voice synthesis is generally mono, one would think. (Vocals are mixed center by default in most stereo mixes...so you'd take the mono data you got back and play the same synthesized data on left and right channels.)

    4. Sample Size - You get shorts. Those are signed integers, 2 bytes, range of -32,768 to 32,767. Probably it uses the entire range, doesn't seem to be configurable, but you could test and see what you get out.

    5. A Buffer In Which We Store Samples - The synthesis buffer appears to belong to espeak, which handles the allocation and freeing of it. I've shown an example of using a std::vector to gather chunks from multiple calls.

    6. Number of Samples - Each call to your SynthCallback will get a potentially different number of samples. You might get 0 for that number and it might not mean it's at the end.