How does an output surface of a Decoder is passed to an input surface of an Encoder?

I'm trying to understand how the surface-to-surface approach works with MediaCodec. In a ByteBuffer only approach, decoded data is placed in OutputBuffers. This non-encoded data can be processed manually then passed to the InputBuffers of an Encoder.

If we give a look at an example from Android MediaCodec CTS using a surface to surface approach to pass data between a decoder and an encoder, we configure the Decoder to output the decoded data onto a Surface called outputSurface, and we configure the Encoder to receive the data on a Surface called inputSurface.

In the documentation, the createInputSurface and the usage of this surface in the configuration of the Encoder is described as so:

createInputSurface(): Requests a Surface to use as the input to an encoder, in place of input buffers.

In other terms, and this is visible in the CTS example in the ByteBuffers declarations: there is just no InputBuffers for the Encoder. You have:

DecoderInputBuffers (receive the video track samples from the MediaExtractor)
DecoderOutputBuffers (output to pull decoded yuv frames)
Nothing. (Well... The input Surface.)
EncoderOutputBuffers (output to pull the re-encoded stuff to pass to a muxer)

Instead of enqueu-ing data in the Encoder InputBuffers, you have these line of codes:

outputSurface.awaitNewImage();
outputSurface.drawImage();
inputSurface.setPresentationTime(videoDecoderOutputBufferInfo.presentationTimeUs * 1000);
inputSurface.swapBuffers();

How is the ouputSurface content of the Decoder passed to the inputSurface of the Encoder? What is concretely happening behind the curtain?

Solution

The decoder's/encoder's output/input Surface respectively is a specially configured (either physically contiguous or reserved etc) piece of memory which specialised hardwares (for example, GPUs, hardware (accelerated) codecs) or software modules can use in a fashion best suited for performance needs (by using features such as hardware acceleration, DMA etc).

More specifically, in the current context for instance, the decoder's output Surface is backed by SurfaceTexture, so that it can be used in an OpenGL environment to be used as an external texture for any kind of processing before it is rendered on the Surface from which the encoder can read and encode to create the final video frame.

Not coincidentally, OpenGL can only render to such a Surface.

So the decoder acts as the provider of raw video frame, the Surface (Texture) the carrier, OpenGL the medium to render it to the Encoder's input Surface which is the destination for the (to be encoded) video frame.

To further satiate your curiosity, check Editing frames and encoding with MediaCodec for more details.

[Edit]

You can check subprojects in grafika Continuous Camera or Show + capture camera, which currently renders Camera frames (fed to SurfaceTexture) to a Video (and display). So essentially, the only change is the MediaCodec feeding frames to SurfaceTexture instead of the Camera.

Google CTS DecodeEditEncodeTest does exactly the same and can be used as a reference in order to make the learning curve smoother.

To start from the very basics, as fadden pointed out use Android graphics tutorials