java android opencv tensorflow-lite mappedbytebuffer

Converting Bitmap to ByteBuffer (float) in Tensorflow-lite Android

In tensorflow-lite android demo code for image classification, the images are first converted to ByteBuffer format for better performance.This conversion from bitmap to floating point format and the subsequent conversion to byte buffer seems to be an expensive operation(loops, bitwise operators, float mem-copy etc).We were trying to implement the same logic with opencv to gain some speed advantage.The following code works without error; but due to some logical error in this conversion, the output of the model(to which this data is fed) seems to be incorrect.The input of the model is supposed to be RGB with data type float[1,197,197,3].

How can we speed up this process of bitmap to byte buffer conversion using opencv (or any other means)?

Standard Bitmap to ByteBuffer Conversion:-

/** Writes Image data into a {@code ByteBuffer}. */
  private void convertBitmapToByteBuffer(Bitmap bitmap) {
    if (imgData == null) {
      return;
    }
    imgData.rewind();


    bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());



    long startTime = SystemClock.uptimeMillis();

    // Convert the image to floating point.
    int pixel = 0;

    for (int i = 0; i < getImageSizeX(); ++i) {
      for (int j = 0; j < getImageSizeY(); ++j) {
        final int val = intValues[pixel++];

        imgData.putFloat(((val>> 16) & 0xFF) / 255.f);
        imgData.putFloat(((val>> 8) & 0xFF) / 255.f);
        imgData.putFloat((val & 0xFF) / 255.f);
      }
    }

    long endTime = SystemClock.uptimeMillis();
    Log.d(TAG, "Timecost to put values into ByteBuffer: " + Long.toString(endTime - startTime));
  }

OpenCV Bitmap to ByteBuffer :-

    /** Writes Image data into a {@code ByteBuffer}. */
      private void convertBitmapToByteBuffer(Bitmap bitmap) {
        if (imgData == null) {
          return;
        }
        imgData.rewind();


        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());

        long startTime = SystemClock.uptimeMillis();


        Mat bufmat = new Mat(197,197,CV_8UC3);
        Mat newmat = new Mat(197,197,CV_32FC3);


        Utils.bitmapToMat(bitmap,bufmat);
        Imgproc.cvtColor(bufmat,bufmat,Imgproc.COLOR_RGBA2RGB);

        List<Mat> sp_im = new ArrayList<Mat>(3);


        Core.split(bufmat,sp_im);

        sp_im.get(0).convertTo(sp_im.get(0),CV_32F,1.0/255/0);
        sp_im.get(1).convertTo(sp_im.get(1),CV_32F,1.0/255.0);
        sp_im.get(2).convertTo(sp_im.get(2),CV_32F,1.0/255.0);

        Core.merge(sp_im,newmat);



        //bufmat.convertTo(newmat,CV_32FC3,1.0/255.0);
        float buf[] = new float[197*197*3];


        newmat.get(0,0,buf);

        //imgData.wrap(buf).order(ByteOrder.nativeOrder()).getFloat();
        imgData.order(ByteOrder.nativeOrder()).asFloatBuffer().put(buf);


        long endTime = SystemClock.uptimeMillis();
        Log.d(TAG, "Timecost to put values into ByteBuffer: " + Long.toString(endTime - startTime));
      }

Solution

I believe that 255/0 in your code is a copy/paste mistake, not real code.
I wonder what the timecost of the pure Java solution is, especially when you weigh it against the timecost of inference. For me, with a slightly larger bitmap for Google's mobilenet_v1_1.0_224, the naïve float buffer preparation was less than 5% of inference time.
I could quantize the tflite model (with the same tflite_convert utility that generated .tflite file from .h5. There could actually be three quantization operations, but I only used two: --inference_input_type=QUANTIZED_UINT8 and --post_training_quantize.
- The resulting model is about 25% size of the float32 one, which is an achievement on its own.
- The resulting model runs about twice faster (at least on some devices).
- And, the resulting model consumes unit8 inputs. This means that instead of imgData.putFloat(((val>> 16) & 0xFF) / 255.f) we write imgData.put((val>> 16) & 0xFF), and so on.

By the way, I don't think that your formulae are correct. To achieve best accuracy when float32 buffers are involved, we use

putFLoat(byteval / 256f)

where byteval is int in range [0:255].