Search code examples
androidtwilioyuv

Google mobile vision not working with Yuv data?


When I give Frame.builder() a bitmap using setBitmap everything works as expected. Faces are detected and the smile probability works too. For some reason, however, .setImageData with a ByteBuffer of an NV21 format YUV image doesn't work. It doesn't throw any errors but faces are not being detected with the YuvImage data.

The i420Frame is from here: https://media.twiliocdn.com/sdk/android/conversations/releases/0.8.1/docs/com/twilio/conversations/I420Frame.html

(I'm assuming it's very similar to the i420Frame from Webrtc, if not exactly the same)

Here is the main code:

    @Override
    public void renderFrame(final I420Frame i420Frame) {
        YuvImage yuvImage = i420ToYuvImage(i420Frame.yuvPlanes, i420Frame.yuvStrides, i420Frame.width, i420Frame.height);

       // Set image data (YUV N21 format) -- NOT working. The commented bitmap line works.
        Frame frame = new Frame.Builder().setImageData(ByteBuffer.wrap(yuvImage.getYuvData()), yuvImage.getWidth(), yuvImage.getHeight(), yuvImage.getYuvFormat()).build();
       //Frame frame = new Frame.Builder().setBitmap(yuvImage).build();

       // Detect faces
       SparseArray<Face> faces = detector.detect(frame);

        if (!detector.isOperational()) {
            Log.e(TAG, "Detector is not operational!");
        }

        if (faces.size() > 0) {
            Log.i("yuv", "Smiling %: " + faces.valueAt(0).getIsSmilingProbability());
        }

       i420Frame.release();
       Log.i("yuv", "Faces detected: " + faces.size());
    }

Here are the functions used for i420ToYuvImage (Taken from Twilio's quickstart guides). The i420 to YuvImage code was not written by me but I trust it is working properly because I took the outputted yuvimage, converted it to a jpeg, converted that to a bitmap, and the google mobile vision library was able to detect faces in the bitmap. But there is a ton of overhead in doing all those conversions. So I am trying to use the YuvImage to feed directly into the mobile vision library like above.

private YuvImage i420ToYuvImage(ByteBuffer[] yuvPlanes, int[] yuvStrides, int width, int height) {
                if (yuvStrides[0] != width) {
                    return fastI420ToYuvImage(yuvPlanes, yuvStrides, width, height);
                }
                if (yuvStrides[1] != width / 2) {
                    return fastI420ToYuvImage(yuvPlanes, yuvStrides, width, height);
                }
                if (yuvStrides[2] != width / 2) {
                    return fastI420ToYuvImage(yuvPlanes, yuvStrides, width, height);
                }

                byte[] bytes = new byte[yuvStrides[0] * height +
                        yuvStrides[1] * height / 2 +
                        yuvStrides[2] * height / 2];
                ByteBuffer tmp = ByteBuffer.wrap(bytes, 0, width * height);
                copyPlane(yuvPlanes[0], tmp);

                byte[] tmpBytes = new byte[width / 2 * height / 2];
                tmp = ByteBuffer.wrap(tmpBytes, 0, width / 2 * height / 2);

                copyPlane(yuvPlanes[2], tmp);
                for (int row = 0 ; row < height / 2 ; row++) {
                    for (int col = 0 ; col < width / 2 ; col++) {
                        bytes[width * height + row * width + col * 2]
                                = tmpBytes[row * width / 2 + col];
                    }
                }
                copyPlane(yuvPlanes[1], tmp);
                for (int row = 0 ; row < height / 2 ; row++) {
                    for (int col = 0 ; col < width / 2 ; col++) {
                        bytes[width * height + row * width + col * 2 + 1] =
                                tmpBytes[row * width / 2 + col];
                    }
                }
                return new YuvImage(bytes, NV21, width, height, null);
            }

            private YuvImage fastI420ToYuvImage(ByteBuffer[] yuvPlanes,
            int[] yuvStrides,
            int width,
            int height) {
                byte[] bytes = new byte[width * height * 3 / 2];
                int i = 0;
                for (int row = 0 ; row < height ; row++) {
                    for (int col = 0 ; col < width ; col++) {
                        bytes[i++] = yuvPlanes[0].get(col + row * yuvStrides[0]);
                    }
                }
                for (int row = 0 ; row < height / 2 ; row++) {
                    for (int col = 0 ; col < width / 2; col++) {
                        bytes[i++] = yuvPlanes[2].get(col + row * yuvStrides[2]);
                        bytes[i++] = yuvPlanes[1].get(col + row * yuvStrides[1]);
                    }
                }
                return new YuvImage(bytes, NV21, width, height, null);
            }

            private void copyPlane(ByteBuffer src, ByteBuffer dst) {
                src.position(0).limit(src.capacity());
                dst.put(src);
                dst.position(0).limit(dst.capacity());
            }

Solution

  • So it turns out the i420frame that Twilio sends to renderFrame() is actually rotated by 270 degrees. So calling a .setRotation fixed the issue. I had actually tried that before but I was calling .setRotation(270), which intuitively made sense to me, but after checking the documentation you have to do .setRotation(Frame.ROTATION_270) or something similar. It all works now. Here is the complete working line:

    Frame frame = new Frame.Builder().setImageData(ByteBuffer.wrap(yuvImage.getYuvData()), yuvImage.getWidth(), yuvImage.getHeight(), yuvImage.getYuvFormat()).setRotation(Frame.ROTATION_270).build();