Search code examples
androidopencvtensorflowarcoretensorflow-lite

Image classification through Tensorflow gives the exact same prediction


Bear with me while I try to elucidate.

I have an Android Application which uses OpenCV to convert a YUV420 image into a bitmap and transfers it to an Interpreter. The problem is, every time I run it, I get the exact same class prediction with the exact same confidence values irrelevant of what I point at.

...
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515]. 
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515]. 
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515]. 
Recognitions : [macbook pro: 0.95353276, cello gripper: 0.023749515].
...

Now before you mention my model is not trained enough, I've tested the exact same .tflite file in TFLite example provided in the Tensorflow Codelab-2. It works as it should and recognizes all 4 of my classes with 90%+ accuracy. In addition, I used a label_image.py script to test the .pb file using which my .tflite was derived from and it works as it should. I've trained the model on nearly 5000+ images of each class. Since it works on other apps, I'm guessing there's no problem with the model but my implementation. Though I just can't pinpoint it.

Following code is used to Create Mat(s) from the image bytes :

//Retrieve the camera Image from ARCore
val cameraImage = frame.acquireCameraImage()
val cameraPlaneY = cameraImage.planes[0].buffer
val cameraPlaneUV = cameraImage.planes[1].buffer

// Create a new Mat with OpenCV. One for each plane - Y and UV
val y_mat = Mat(cameraImage.height, cameraImage.width, CvType.CV_8UC1, cameraPlaneY)
val uv_mat = Mat(cameraImage.height / 2, cameraImage.width / 2, CvType.CV_8UC2, cameraPlaneUV)
var mat224 = Mat()
var cvFrameRGBA = Mat()

// Retrieve an RGBA frame from the produced YUV
Imgproc.cvtColorTwoPlane(y_mat, uv_mat, cvFrameRGBA, Imgproc.COLOR_YUV2BGRA_NV21)
// I've tried the following in the above line
// Imgproc.COLOR_YUV2RGBA_NV12
// Imgproc.COLOR_YUV2RGBA_NV21
// Imgproc.COLOR_YUV2BGRA_NV12
// Imgproc.COLOR_YUV2BGRA_NV21

Following code is used to add Image data into a ByteBuffer :

// imageFrame is a Mat object created from OpenCV by processing a YUV420 image received from ARCore
override fun setImageFrame(imageFrame: Mat) {
    ...
    // Convert mat224 into a float array that can be sent to Tensorflow
    val rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
    rgbBytes.order(ByteOrder.nativeOrder())

    val frameBitmap = Bitmap.createBitmap(imageFrame.cols(), imageFrame.rows(), Bitmap.Config.ARGB_8888, true)
    // convert Mat to Bitmap
    Utils.matToBitmap(imageFrame, frameBitmap, true)
    frameBitmap.getPixels(intValues, 0, frameBitmap.width, 0, 0, frameBitmap.width, frameBitmap.height)

    // Iterate over all pixels and retrieve information of RGB channels
    intValues.forEach { packedPixel ->
        rgbBytes.putFloat((((packedPixel shr 16) and 0xFF) - 128) / 128.0f)
        rgbBytes.putFloat((((packedPixel shr 8) and 0xFF) - 128) / 128.0f)
        rgbBytes.putFloat(((packedPixel and 0xFF) - 128) / 128.0f)
    }
}

.......
private var labelProb: Array<FloatArray>? = null
.......
// and classify 
labelProb?.let { interpreter?.run(rgbBytes, it) }
.......

I checked the bitmap that gets converted from Mat. It shows up quite as best as it possibly can.

Any ideas anyone?

Update One

I changed the implementation of setImageFrame method slightly to match an implementation here. Since it works for him, I hoped it would work for me as well. It still doesn't.

override fun setImageFrame(imageFrame: Mat) {
    // Reset the rgb bytes buffer
    rgbBytes.rewind()

    // Iterate over all pixels and retrieve information of RGB channels only
    for(rows in 0 until imageFrame.rows())
        for(cols in 0 until imageFrame.cols()) {
            val imageData = imageFrame.get(rows, cols)
            // Type of Mat is 24
            // Channels is 4
            // Depth is 0
            rgbBytes.putFloat(imageData[0].toFloat())
            rgbBytes.putFloat(imageData[1].toFloat())
            rgbBytes.putFloat(imageData[2].toFloat())
        }
}

Update Two

Suspicious of my float model, I changed it to a pre-built MobileNet Quant model just to eliminate a possibility. The problem persists in this as well.

...
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
Recognitions : [candle: 18.0, otterhound: 15.0, syringe: 13.0, English foxhound: 11.0]
...

Solution

  • Okay. so After 4 days, I was able to finally solve this. The Issue was how The ByteBuffer is initiated. I was doing :

    private var rgbBytes: ByteBuffer = ByteBuffer.allocate(1 * 4 * 224 * 224 * 3)
    

    instead of what I ought to be doing :

    private val rgbBytes: ByteBuffer = ByteBuffer.allocateDirect(1 * 4 * 224 * 224 * 3)
    

    I tried to understand what is the difference between ByteBuffer.allocate() and ByteBuffer.allocateDirect() here but to no avail.

    I'd be glad if someone can answer two further questions :

    1. Why does Tensorflow need a Direct Byte Buffer rather than a Non Direct buffer?
    2. What is the difference between Direct and Non Direct ByteBuffer in a simplified description?