I am currently working on a project where I try to stream audio from one device to another in real-time using websockets. For this I am trying to implement a 'cross platform' implementation that works with browsers, android and ios. My goal is to record and play PCM audio in various formats. The PCM produced by the browser (chrome & firefox) has 32-bit encoding, which I try to play on an android phone. Just for reference here is the project.
On android I am recording with AudioRecord and streaming the raw pcm over a websocket to another device. And similarly I play it using AudioTrack. Everything works fine if I use 16-bit encoding, with sample rate 44100Hz and 2 channels. However it does not seem to work with 32-bit encoding. The recording from the browser (32-bit) does not play, even though I did interleave the channels etc. And similarly when I try to record on android using 32-bit encoding, it does not produce any sound as it is not playing anything on the browser.
I tried to play a wav file on android with 32-bit encoding, and it works fine. However, I don't know if the system does down-sampling in the background.
My goal is to avoid down/up-sampling as much as possible, as I want to achieve low-latency.
I could not find any solution online, is this a common issue, am I missing something here?
With 32-bit encoding the result of the write
method returns AudioTrack.ERROR_INVALID_OPERATION
int result = audioTrack.write(buffer, 0, buffer.length, AudioTrack.WRITE_BLOCKING);
if(result == AudioTrack.ERROR_BAD_VALUE){
System.out.println("ERROR: bad value");
}else if (result == AudioTrack.ERROR_DEAD_OBJECT){
System.out.println("ERROR: dead object");
}else if (result == AudioTrack.ERROR_INVALID_OPERATION){
System.out.println("ERROR: invalid operation");
}else if (result == AudioTrack.ERROR){
System.out.println("ERROR: ??");
}else{
System.out.println("Successfully written to buffer!");
}
Implementation for recording Audio:
public class AudioStream {
private AudioStreamMetadata metadata = AudioStreamMetadata.getDefault();
...
public void start() {
...
new Thread(() -> {
socket.send("started");
socket.send(metadata.toString());
while (!hasStopped) {
float[] data = new float[metadata.getBufferSize()];
recorder.read(data, 0, data.length, AudioRecord.READ_BLOCKING);
byte[] output = new byte[data.length * metadata.getBytesPerSample()];
ByteBuffer.wrap(output).order(ByteOrder.LITTLE_ENDIAN).asFloatBuffer().put(data);
socket.send(ByteString.of(output));
}
}).start();
}
private void initRecorder() {
int min = AudioRecord.getMinBufferSize(metadata.getSampleRate(), metadata.getChannels(true), metadata.getEncoding());
recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, metadata.getSampleRate(),
metadata.getChannels(true), metadata.getEncoding(), min);
}
}
AudioStreamMetadata Class:
public class AudioStreamMetadata {
public static final int DEFAULT_SAMPLE_RATE = 44100;
public static final int DEFAULT_CHANNELS = 2;
public static final int DEFAULT_ENCODING = 32;
public static final int DEFAULT_BUFFER_SIZE = 6144*4;
...
public AudioStreamMetadata(int sampleRate, int bufferSize, int channels, int encoding) {
this.sampleRate = sampleRate;
this.bufferSize = bufferSize;
this.channels = channels;
this.encoding = encoding;
this.bytesPerSample = encoding / 8;
this.bufferSizeInBytes = bufferSize * bytesPerSample;
}
//getters
public int getChannels(boolean in) {
if(channels == 1){
return in? AudioFormat.CHANNEL_IN_MONO : AudioFormat.CHANNEL_OUT_MONO;
}else if(channels == 2){
return in? AudioFormat.CHANNEL_IN_STEREO : AudioFormat.CHANNEL_OUT_STEREO;
}else{
return 0;
}
}
public int getEncoding() {
if(encoding == 8){
return AudioFormat.ENCODING_PCM_8BIT;
}else if(encoding == 16){
return AudioFormat.ENCODING_PCM_16BIT;
}else if(encoding == 32){
return AudioFormat.ENCODING_PCM_FLOAT;
}else{
return 0;
}
}
public static AudioStreamMetadata getDefault(){
return new AudioStreamMetadata(DEFAULT_SAMPLE_RATE, DEFAULT_BUFFER_SIZE, DEFAULT_CHANNELS, DEFAULT_ENCODING);
}
}
I assumed that AudioTrack
would be able to handle different datatypes in write()
since I initialize it with the correct configurations. However, an AudioTrack
initialized as 8-bit-encoding accepts only byte
, as 16-bit-encoding both byte
and short
, but an AudioTrack
as 32float-bit-encoding accepts only float
.
I am receiving the data from the socket as a byte[]
, which I needed to convert to a float[]
.
@Override
public void onMessage(WebSocket webSocket, ByteString bytes) {
super.onMessage(webSocket, bytes);
byte[] buffer = bytes.toByteArray();
FloatBuffer fb = ByteBuffer.wrap(buffer).asFloatBuffer();
float[] out = new float[fb.capacity()];
fb.get(out);
int result = audioTrack.write(out, 0, out.length, AudioTrack.WRITE_BLOCKING);
}