Search code examples
speech-recognitioncodenameonegoogle-speech-api

How to use Google Speech API from Codename One?


I want to record audio from the phone, and then send it to the google speech non-streaming API. I can record using Capture.captureAudio(), but then I do not know what is the audio encoding and the sample rate, since they are needed for the api request. How can I get the audio encoding and the sample rate, so that I can send them with my API request?


Solution

  • If you check the sources on Android it records in AMR-WB

            recorder.setAudioSource(MediaRecorder.AudioSource.MIC);
            recorder.setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP);
            recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_WB);
            recorder.setOutputFile(temp.getAbsolutePath());
    

    Google speech API accepts AMR-WB if you properly set audio format.

    Another problem is that file is recorded as AMR-WB in 3GPP container, so you need a custom code to extract audio data form 3GPP, you can find it here:

    // #!AMR\n
    private static byte[] AMR_MAGIC_HEADER = {0x23, 0x21, 0x41, 0x4d, 0x52, 0x0a};
    
    
    public byte[] convert3gpDataToAmr(byte[] data) {
        if (data == null) {
            return null;
        }
    
        ByteArrayInputStream bis = new ByteArrayInputStream(data);
        // read FileTypeHeader
        FileTypeBox ftypHeader = new FileTypeBox(bis);
        // You can check if it is correct here
        // read MediaDataHeader
        MediaDataBox mdatHeader = new MediaDataBox(bis);
        // You can check if it is correct here
        int rawAmrDataLength = mdatHeader.getDataLength();
        int fullAmrDataLength = AMR_MAGIC_HEADER.length + rawAmrDataLength;
        byte[] amrData = new byte[fullAmrDataLength];
        System.arraycopy(AMR_MAGIC_HEADER, 0, amrData, 0, AMR_MAGIC_HEADER.length);
        bis.read(amrData, AMR_MAGIC_HEADER.length, rawAmrDataLength);
        return amrData;
    }
    

    Also note that AMR-WB gives you slightly lower accuracy, so you might want to consider raw audio capture with more detailed API, not codenameone.