How to turn a PCM byte array into little-endian and mono?

I'm trying to feed audio from an online communication app into the Vosk speech recognition API.

The audio comes in form of a byte array and with this audio format PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian. In order to be able to process it with Vosk, it needs to be mono and little-endian.

This is my current attempt:

        byte[] audioData = userAudio.getAudioData(1);
        short[] convertedAudio = new short[audioData.length / 2];
        ByteBuffer buffer = ByteBuffer.allocate(convertedAudio.length * Short.BYTES);
        
        // Convert to mono, I don't think I did it right though
        int j = 0;
        for (int i = 0; i < audioData.length; i += 2)
            convertedAudio[j++] = (short) (audioData[i] << 8 | audioData[i + 1] & 0xFF);

        // Convert to little endian
        buffer.order(ByteOrder.BIG_ENDIAN);
        for (short s : convertedAudio)
            buffer.putShort(s);
        buffer.order(ByteOrder.LITTLE_ENDIAN);
        buffer.rewind();

        for (int i = 0; i < convertedAudio.length; i++)
            convertedAudio[i] = buffer.getShort();

        queue.add(convertedAudio);

Solution

I had this same problem and found this stackoverflow post that converts the raw pcm byte array into an audio input stream.

I assume you're using Java Discord API (JDA), so here's my initial code I have for the 'handleUserAudio()' function that utilizes vosk, and the code in the link I provided above:

                // Define audio format that vosk uses
            AudioFormat target = new AudioFormat(
                    16000, 16, 1, true, false);

            try {
                byte[] data = userAudio.getAudioData(1.0f);
                // Create audio stream that uses the target format and the byte array input stream from discord
                AudioInputStream inputStream = AudioSystem.getAudioInputStream(target,
                        new AudioInputStream(
                                new ByteArrayInputStream(data), AudioReceiveHandler.OUTPUT_FORMAT, data.length));

                // This is what was used before
//                InputStream inputStream = new ByteArrayInputStream(data);

                int nbytes;
                byte[] b = new byte[4096];
                while ((nbytes = inputStream.read(b)) >= 0) {
                    if (recognizer.acceptWaveForm(b, nbytes)) {
                        System.out.println(recognizer.getResult());
                    } else {
                        System.out.println(recognizer.getPartialResult());
                    }
                }
//                queue.add(data);
            } catch (Exception e) {
                e.printStackTrace();
            }

This works thus far, however, it throws everything into the '.getPartialResult()' method of the recognizer, but at least vosk is understanding the audio coming from the discord bot.