I have been using the opus library for a while now, mostly using mono channel audio. However, I have been working on dual channel audio and have been trying to get it to work with opus. I initially thought it would be pretty straight forward, because I'd just be increasing the number of channels in the opus_encoder_create
function, and increasing my frame_size in the opus_encode
function. However, when I do encode the audio, all I get out is a jumbled mess, of what appears to be a summation of the two waveforms. The input is PCM data, interleaved by each sample (L1, R1, L2, R2, .. LN, RN). I am unsure how to proceed, any advice, or sample code would be appreciated.
So far, I have tried supplying the data as (L1, L2,.. LN, R1, R2, ... RN) and when I do it as such, the audio is intelligible, but the two channels are still corrupted, and I don't believe that this is the correct approach.
Here is a sample of how I am doing the conversion
#define NUM_CHANNELS 2
#define SAMPLE_RATE 8000
#define FRAME_SIZE 960
#define OUTPUT_FRAME_BYTES FRAME_SIZE / 8
int main(void) {
int err;
OpusEncoder *encoder;
int16_t pcm_data[FRAME_SIZE * NUM_CHANNELS];
uint8_t compressed_data[OUTPUT_FRAME_BYTES * NUM_CHANNELS];
while (true) {
/* Gather Dual Channel PCM Data from Microphone, and interleave
pcm_data[0] = left channel sample 1
pcm_data[1] = right channel sample 1
pcm_data[2] = left channel sample 2
pcm_data[3] = right channel sample 2
...
pcm_data[FRAME_SIZE * NUM_CHANNELS - 2] = left channel sample FRAME_SIZE
pcm_data[FRAME_SIZE * NUM_CHANNELS - 1] = right channel sample FRAME_SIZE
*/
encoder =
opus_encoder_create(8000, NUM_CHANNELS, OPUS_APPLICATION_RESTRICTED_LOWDELAY, &err);
opus_encoder_ctl(encoder, OPUS_SET_COMPLEXITY(0));
opus_encode(encoder, pcm_data, FRAME_SIZE, compressed_data,
OUTPUT_FRAME_BYTES * NUM_CHANNELS);
// Write compressed data to file
opus_encoder_destroy(encoder);
}
return 0;
}
I think you have incorrect FRAME_SIZE
. Official documentation says:
To encode a frame, opus_encode() or opus_encode_float() must be called with exactly one frame (2.5, 5, 10, 20, 40 or 60 ms) of audio data
Frame size can be calculated as frameSize = frameLength * sampleRate / 1000
where frameLength
is in ms. So you should have frame size from 20 (for 2.5 ms) to 480 (for 60 ms).
Another thing - your compressed_data
size shouldn't depend on FRAME_SIZE
, at the link above you can find suggested size for the output payload:
4000 bytes is recommended