Search code examples
javaarraysaudiobytejava-audio

How to turn a PCM byte array into little-endian and mono?


I'm trying to feed audio from an online communication app into the Vosk speech recognition API.

The audio comes in form of a byte array and with this audio format PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian. In order to be able to process it with Vosk, it needs to be mono and little-endian.

This is my current attempt:

        byte[] audioData = userAudio.getAudioData(1);
        short[] convertedAudio = new short[audioData.length / 2];
        ByteBuffer buffer = ByteBuffer.allocate(convertedAudio.length * Short.BYTES);
        
        // Convert to mono, I don't think I did it right though
        int j = 0;
        for (int i = 0; i < audioData.length; i += 2)
            convertedAudio[j++] = (short) (audioData[i] << 8 | audioData[i + 1] & 0xFF);

        // Convert to little endian
        buffer.order(ByteOrder.BIG_ENDIAN);
        for (short s : convertedAudio)
            buffer.putShort(s);
        buffer.order(ByteOrder.LITTLE_ENDIAN);
        buffer.rewind();

        for (int i = 0; i < convertedAudio.length; i++)
            convertedAudio[i] = buffer.getShort();

        queue.add(convertedAudio);

Solution

  • I had this same problem and found this stackoverflow post that converts the raw pcm byte array into an audio input stream.

    I assume you're using Java Discord API (JDA), so here's my initial code I have for the 'handleUserAudio()' function that utilizes vosk, and the code in the link I provided above:

                    // Define audio format that vosk uses
                AudioFormat target = new AudioFormat(
                        16000, 16, 1, true, false);
    
                try {
                    byte[] data = userAudio.getAudioData(1.0f);
                    // Create audio stream that uses the target format and the byte array input stream from discord
                    AudioInputStream inputStream = AudioSystem.getAudioInputStream(target,
                            new AudioInputStream(
                                    new ByteArrayInputStream(data), AudioReceiveHandler.OUTPUT_FORMAT, data.length));
    
                    // This is what was used before
    //                InputStream inputStream = new ByteArrayInputStream(data);
    
                    int nbytes;
                    byte[] b = new byte[4096];
                    while ((nbytes = inputStream.read(b)) >= 0) {
                        if (recognizer.acceptWaveForm(b, nbytes)) {
                            System.out.println(recognizer.getResult());
                        } else {
                            System.out.println(recognizer.getPartialResult());
                        }
                    }
    //                queue.add(data);
                } catch (Exception e) {
                    e.printStackTrace();
                }
    

    This works thus far, however, it throws everything into the '.getPartialResult()' method of the recognizer, but at least vosk is understanding the audio coming from the discord bot.