Search code examples
javaandroidaudioandroid-mediacodec

Understanding of MediaCodec and MediaExtractor


I want to do some processing on audio files without playing them, just math. I doubt if I'm doing right and have several questions. I read some examples but most of them is about video streaming and there is no working with raw data at all.

  1. I prepared an mp3 file that has 2 identical channels, i.e. it is stereo but the left and the right are the same. After decoding I expected to get buffer with pairs of equal numbers because PCM-16 stores samples of channels alternately, like {L R L R L R...}, right? E.g.:

    {105 105 601 601 -243 -243 -484 -484...}.

    But I get pairs of close numbers but not equal:

    {-308 -264 -1628 -1667 -2568 -2550 -4396 -4389}

    Does mp3 algorithms encode the same values differently or why?

  2. I want to process data in packs of 1024 samples. If there will be not enough samples for another pack I want to save the rest until next batch of raw data (see mExcess in code). Is there guarantee that order will be kept?

  3. I used to understand "sample" as every single value of audio data. Here I see MediaExtractor::readSampleData and MediaExtractor::advance methods. The first returns ~2000 values, in description of the second said "Advance to the next sample". Is this just overlap of naming? I saw couple of examples where these methods are called in pair in loop. Is my usage correct?

Here is my code:

public static void foo(String filepath) throws IOException {
    final int SAMPLES_PER_CHUNK = 1024;

    MediaExtractor mediaExtractor = new MediaExtractor();
    mediaExtractor.setDataSource(filepath);
    MediaFormat mediaFormat = mediaExtractor.getTrackFormat(0);
    mediaExtractor.release();

    MediaCodecList mediaCodecList = new MediaCodecList(MediaCodecList.ALL_CODECS);
    mediaFormat.setString(MediaFormat.KEY_FRAME_RATE, null);
    String codecName = mediaCodecList.findDecoderForFormat(mediaFormat);
    mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 0);  // MediaCodec crashes with JNI
                                                            // error if FRAME_RATE is null
    MediaCodec mediaCodec = MediaCodec.createByCodecName(codecName);
    mediaCodec.setCallback(new MediaCodec.Callback() {
        private MediaExtractor mExtractor;
        private short[] mExcess;

        @Override
        public void onInputBufferAvailable(MediaCodec codec, int index) {
            if (mExtractor == null) {
                mExtractor = new MediaExtractor();
                try {
                    mExtractor.setDataSource(filepath);
                    mExtractor.selectTrack(0);
                } catch (IOException e) {
                    e.printStackTrace();
                }
                mExcess = new short[0];
            }
            ByteBuffer in = codec.getInputBuffer(index);
            in.clear();
            int sampleSize = mExtractor.readSampleData(in, 0);
            if (sampleSize > 0) {
                boolean isOver = !mExtractor.advance();
                codec.queueInputBuffer(
                        index,
                        0,
                        sampleSize,
                        mExtractor.getSampleTime(),
                        isOver ? MediaCodec.BUFFER_FLAG_END_OF_STREAM : 0);
            } else {
                int helloAmaBreakpoint = 1;
            }
        }

        @Override
        public void onOutputBufferAvailable(
                MediaCodec codec,
                int index,
                MediaCodec.BufferInfo info) {
            ByteBuffer tmp = codec.getOutputBuffer(index);
            if (tmp.limit() == 0) return;

            ShortBuffer out = tmp.order(ByteOrder.nativeOrder()).asShortBuffer();
            // Prepend the remainder from previous batch to the new data
            short[] buf = new short[mExcess.length + out.limit()];
            System.arraycopy(mExcess, 0, buf, 0, mExcess.length);
            out.get(buf, mExcess.length, out.limit());

            final int channelCount
                    = codec.getOutputFormat().getInteger(MediaFormat.KEY_CHANNEL_COUNT);
            for (
                    int offset  = 0;
                    offset + SAMPLES_PER_CHUNK * channelCount < buf.length;
                    offset += SAMPLES_PER_CHUNK * channelCount) {

                double[] x = new double[SAMPLES_PER_CHUNK];  // left channel
                double[] y = new double[SAMPLES_PER_CHUNK];  // right channel
                switch (channelCount) {
                    case 1:  // if 1 channel then make 2 identical arrays
                        for (int i = 0; i < SAMPLES_PER_CHUNK; ++i) {
                            x[i] = (double) buf[offset + i];
                            y[i] = (double) buf[offset + i];
                        }
                        break;
                    case 2:  // if 2 channels then read values alternately
                        for (int i = 0; i < SAMPLES_PER_CHUNK; ++i) {
                            x[i] = (double) buf[offset + i * 2];
                            y[i] = (double) buf[offset + i * 2 + 1];
                        }
                        break;
                    default:
                        throw new IllegalStateException("No algorithm for " + channelCount + " channels");
                }

                /// ... some processing ... ///
            }

            // Save the rest until next batch of raw data
            int samplesLeft = buf.length % (SAMPLES_PER_CHUNK * channelCount);
            mExcess = new short[samplesLeft];
            System.arraycopy(
                    buf,
                    buf.length - samplesLeft,
                    mExcess,
                    0,
                    samplesLeft);

            codec.releaseOutputBuffer(index, false);
            if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) > 0) {
                codec.stop();
                codec.release();
                mExtractor.release();
            }
        }

        @Override
        public void onError(MediaCodec codec, MediaCodec.CodecException e) {

        }

        @Override
        public void onOutputFormatChanged(MediaCodec codec, MediaFormat format) {

        }
    });

    mediaFormat.setInteger(MediaFormat.KEY_PCM_ENCODING, AudioFormat.ENCODING_PCM_16BIT);
    mediaCodec.configure(mediaFormat, null, null, 0);
    mediaCodec.start();
}

Quick code review is also welcome.


Solution

    1. I'm exactly sure of why it would code them this way, but I think that small variance is within the expected tolerance. Keep in mind that mp3 being a lossy codec, the output values from the decoder won't be the same as the input, as long as the audible representation is close enough. But that doesn't indicate why the two channels would end up subtly different.

    2. Yes, the individual order of decoded frames will be the same. The exact values won't match but the sound of it should be similar.

    3. In MediaExtractor, a sample is one encoded packet of data, which you should feed to the decoder. For mp3, this would typically be 1152 samples (per channel).