Search code examples
androidopengl-essurfaceviewandroid-mediacodec

MediaCodec: Convert image to video


I want to be able to write a bitmap to a video using MediaCodec. I want the video to be e.g. 3 seconds long and 30 fps. I am targeting Android API 21.

I have a class that does the drawing:

public class ImageRenderer {
    private static final String NO_FILTER_VERTEX_SHADER = "" +
            "attribute vec4 position;\n" +
            "attribute vec4 inputTextureCoordinate;\n" +
            " \n" +
            "varying vec2 textureCoordinate;\n" +
            " \n" +
            "void main()\n" +
            "{\n" +
            "    gl_Position = position;\n" +
            "    textureCoordinate = inputTextureCoordinate.xy;\n" +
            "}";
    private static final String NO_FILTER_FRAGMENT_SHADER = "" +
            "varying highp vec2 textureCoordinate;\n" +
            " \n" +
            "uniform sampler2D inputImageTexture;\n" +
            " \n" +
            "void main()\n" +
            "{\n" +
            "     gl_FragColor = texture2D(inputImageTexture, textureCoordinate);\n" +
            "}";

    private int mGLProgId;
    private int mGLAttribPosition;
    private int mGLUniformTexture;
    private int mGLAttribTextureCoordinate;

    private static final int NO_IMAGE = -1;
    private static final float CUBE[] = {
            -1.0f, -1.0f,
            1.0f, -1.0f,
            -1.0f, 1.0f,
            1.0f, 1.0f,
    };

    private int mGLTextureId = NO_IMAGE;
    private final FloatBuffer mGLCubeBuffer;
    private final FloatBuffer mGLTextureBuffer;

    private Bitmap bitmap;

    private static final float TEXTURE_NO_ROTATION[] = {
            0.0f, 1.0f,
            1.0f, 1.0f,
            0.0f, 0.0f,
            1.0f, 0.0f,
    };

    public ImageRenderer(Bitmap bitmap) {
        this.bitmap = bitmap;

        mGLCubeBuffer = ByteBuffer.allocateDirect(CUBE.length * 4)
                .order(ByteOrder.nativeOrder())
                .asFloatBuffer();
        mGLCubeBuffer.put(CUBE).position(0);

        mGLTextureBuffer = ByteBuffer.allocateDirect(TEXTURE_NO_ROTATION.length * 4)
                .order(ByteOrder.nativeOrder())
                .asFloatBuffer();
        mGLTextureBuffer.put(TEXTURE_NO_ROTATION).position(0);

        GLES20.glClearColor(0, 0, 0, 1);
        GLES20.glDisable(GLES20.GL_DEPTH_TEST);

        mGLProgId = OpenGlUtils.loadProgram(NO_FILTER_VERTEX_SHADER, NO_FILTER_FRAGMENT_SHADER);
        mGLAttribPosition = GLES20.glGetAttribLocation(mGLProgId, "position");
        mGLUniformTexture = GLES20.glGetUniformLocation(mGLProgId, "inputImageTexture");
        mGLAttribTextureCoordinate = GLES20.glGetAttribLocation(mGLProgId,
                "inputTextureCoordinate");

        GLES20.glViewport(0, 0, bitmap.getWidth(), bitmap.getHeight());
        GLES20.glUseProgram(mGLProgId);
    }

    public void drawFrame() {
        GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT | GLES20.GL_DEPTH_BUFFER_BIT);

        // Draw bitmap
        mGLTextureId = OpenGlUtils.loadTexture(bitmap, mGLTextureId, false);

        GLES20.glUseProgram(mGLProgId);

        mGLCubeBuffer.position(0);
        GLES20.glVertexAttribPointer(mGLAttribPosition, 2, GLES20.GL_FLOAT, false, 0, mGLCubeBuffer);
        GLES20.glEnableVertexAttribArray(mGLAttribPosition);
        mGLTextureBuffer.position(0);
        GLES20.glVertexAttribPointer(mGLAttribTextureCoordinate, 2, GLES20.GL_FLOAT, false, 0,
                mGLTextureBuffer);
        GLES20.glEnableVertexAttribArray(mGLAttribTextureCoordinate);
        if (mGLTextureId != OpenGlUtils.NO_TEXTURE) {
            GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
            GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, mGLTextureId);
            GLES20.glUniform1i(mGLUniformTexture, 0);
        }
        GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
        GLES20.glDisableVertexAttribArray(mGLAttribPosition);
        GLES20.glDisableVertexAttribArray(mGLAttribTextureCoordinate);
        GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, 0);
    }
}

I also have an InputSurface hooked up to my video encoder and muxer.

At the start of the processing, and then every time I successfully mux a frame thereafter, I call:

inputSurface.makeCurrent();
imageRenderer.drawFrame();
inputSurface.setPresentationTime(presentationTimeNs);
inputSurface.swapBuffers();
inputSurface.releaseEGLContext();

where inputSurface and imageRenderer are instances of the above classes, and presentationTimeNs I calculate based on the required frame rate.

This generally does work, but feels pretty inefficient. I feel like I am unnecessarily redrawing the bitmap over and over, even though I know it hasn't changed. I tried calling drawFrame() just once or twice at the beginning, but then the outputted video flickers to black on all my Samsung test devices.

Is there a more efficient way I can draw the same bitmap over and over to my encoder input surface?


Solution

  • Efficiency wise, this drawing of the input frame probably is as efficient as it will get. Each time you submit a frame to the encoder, you can't really assume anything about the input surface buffer content (I think), so you need to copy the content to be encoded into it somehow, and this does pretty much it.

    If you skip drawing, you need to keep in mind that the surface you're drawing into isn't just a single buffer, but a set of a number of buffers (usually 4-10 buffers or so). When using the direct buffer access mode of the encoder, the encoder will tell you exactly which one of the buffers out of the pool it gave you to fill, and in such cases, you might have better luck with skipping drawing in the case if you've already filled the buffer before (and hoping that the encoder hasn't invalidated the contents).

    With surface input, you don't get to know which buffer you got to write into. In that case, you could e.g. try just doing the drawing the first N times. I don't think you can get the actual number of buffers though - you could try calling the deprecated getInputBuffers() method, but I don't think it's possible to use it in combination with surface input.

    However, about performance, the absolutely biggest issue and reason for your (lack of) performance is that you're doing everything synchronously. You said

    At the start of the processing, and then every time I successfully mux a frame thereafter, I call

    Hardware encoders generally have a bit of latency, and the time it takes to encode a single frame from start to finish is longer than the average time per frame, if you start encoding more than one at a time.

    Assuming you're using MediaCodec in async mode, I would suggest to just serially do the encoding of all the 90 frames in one thread, and write output packets to the muxer when you get them in the callback. That should keep the encoder pipeline busy. (Once the input buffers to the encoder are exhausted, the inputSurface methods will block until the encoder has completed a frame and freed up another one of the input buffers.) You might also want to buffer the output packets in a queue and write them asynchronously to the muxer (I remember reading about cases where MediaMuxer occasionally can block longer than you'd like).