I'm trying to feed audio from an online communication app into the Vosk speech recognition API.
The audio comes in form of a byte array and with this audio format PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
In order to be able to process it with Vosk, it needs to be mono
and little-endian
This is my current attempt:
byte[] audioData = userAudio.getAudioData(1);
short[] convertedAudio = new short[audioData.length / 2];
ByteBuffer buffer = ByteBuffer.allocate(convertedAudio.length * Short.BYTES);
// Convert to mono, I don't think I did it right though
int j = 0;
for (int i = 0; i < audioData.length; i += 2)
convertedAudio[j++] = (short) (audioData[i] << 8 | audioData[i + 1] & 0xFF);
// Convert to little endian
for (short s : convertedAudio)
for (int i = 0; i < convertedAudio.length; i++)
convertedAudio[i] = buffer.getShort();
I had this same problem and found this stackoverflow post that converts the raw pcm byte array into an audio input stream.
I assume you're using Java Discord API (JDA), so here's my initial code I have for the 'handleUserAudio()' function that utilizes vosk, and the code in the link I provided above:
// Define audio format that vosk uses
AudioFormat target = new AudioFormat(
16000, 16, 1, true, false);
try {
byte[] data = userAudio.getAudioData(1.0f);
// Create audio stream that uses the target format and the byte array input stream from discord
AudioInputStream inputStream = AudioSystem.getAudioInputStream(target,
new AudioInputStream(
new ByteArrayInputStream(data), AudioReceiveHandler.OUTPUT_FORMAT, data.length));
// This is what was used before
// InputStream inputStream = new ByteArrayInputStream(data);
int nbytes;
byte[] b = new byte[4096];
while ((nbytes = inputStream.read(b)) >= 0) {
if (recognizer.acceptWaveForm(b, nbytes)) {
} else {
// queue.add(data);
} catch (Exception e) {
This works thus far, however, it throws everything into the '.getPartialResult()' method of the recognizer, but at least vosk is understanding the audio coming from the discord bot.