Search code examples
javaandroidkotlinfirebase-mlkitandroid-camerax

How to set a box on CameraX preview so as to processes it using ImageAnalysis in Java?


I have been working on an app which needed to use CameraX for it's preview stream but it also needs a box kind of overlay from which the text will be decoded. I have successfully implemented the preview but for can't seem to find a way to implement an overlay from which the text will be decoded without using any third party application. Right now we can decode text from the entire screen. I have seen a code that does just this in Codelabs turtorial (link) but it's in Kotlin and I can't decipher this complex Kotlin code. If anyone can help me do this without using third party library,it would be great. Thanks in advance.

my XML code:

<androidx.camera.view.PreviewView
android:id="@+id/previewView"
android:layout_width="match_parent"
android:layout_height="675dp"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/toolbar">

my camera logic:

PreviewView mCameraView;
Camera camera;
void startCamera() {
  mCameraView = findViewById(R.id.previewView);

  cameraProviderFuture = ProcessCameraProvider.getInstance(this);

  cameraProviderFuture.addListener(() -> {
      try {
          ProcessCameraProvider cameraProvider = cameraProviderFuture.get();
          bindPreview(cameraProvider);
      } catch (ExecutionException | InterruptedException e) {
          // No errors need to be handled for this Future.
          // This should never be reached.
      }
  }, ContextCompat.getMainExecutor(this));
}



void bindPreview(@NonNull ProcessCameraProvider cameraProvider) {


  Preview preview = new Preview.Builder().
        setTargetResolution(BestSize())
        .build();

CameraSelector cameraSelector = new CameraSelector.Builder()
        .requireLensFacing(CameraSelector.LENS_FACING_BACK)
        .build();

preview.setSurfaceProvider(mCameraView.createSurfaceProvider());

ImageAnalysis imageAnalysis = new ImageAnalysis.Builder()
            .setTargetResolution(new Size(4000, 5000))
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .build();

imageAnalysis.setAnalyzer(executor, image -> {
        frames++;
        int rotationDegrees = degreesToFirebaseRotation(image.getImageInfo().getRotationDegrees());

        Image mediaImage = image.getImage();
        if (mediaImage == null) {
            return;
        }

        FirebaseVisionImage firebaseVisionImage = FirebaseVisionImage.fromMediaImage(mediaImage, 
       rotationDegrees);

        FirebaseVisionTextRecognizer detector = 
          FirebaseVision.getInstance().getOnDeviceTextRecognizer();

        detector.processImage(firebaseVisionImage)
                .addOnSuccessListener(firebaseVisionText -> {
                    // Task completed successfully
                    String text = firebaseVisionText.getText();
                    if (!text.isEmpty()) {
                        if (firstValidFrame == 0)
                            firstValidFrame = frames;
                        validFrames++;
                    }
                    mTextView.setText(text);
                    image.close();
                })
                .addOnFailureListener(
                        e -> {
                            Log.e("Error", e.toString());
                            image.close();
                        });
    });
camera = cameraProvider.bindToLifecycle(this, cameraSelector, preview);

}

private int degreesToFirebaseRotation(int degrees) {
  switch (degrees) {
      case 0:
          return FirebaseVisionImageMetadata.ROTATION_0;
      case 90:
          return FirebaseVisionImageMetadata.ROTATION_90;
      case 180:
          return FirebaseVisionImageMetadata.ROTATION_180;
      case 270:
          return FirebaseVisionImageMetadata.ROTATION_270;
      default:
          throw new IllegalArgumentException(
                  "Rotation must be 0, 90, 180, or 270.");
  }
}

Solution

  • I found out how to do it and I wrote an article with a demo repo for those who are having the same problem that I had. Here is the link: https://medium.com/@sdptd20/exploring-ocr-capabilities-of-ml-kit-using-camera-x-9949633af0fe

    1. So basically what I did was get the frames from the Camera X preview using Image Analysis.
    2. Then I created a surface view on top of the preview and drew a rectangle on it.
    3. Then I took the offset of the rectangle and cropped my bitmap according to that.
    4. And then I fed the bitmaps to the FirebaseImageAnalyzer and I got the text that's displayed in the bounding box only.

    Here is the gist of the main activity: `

    public class MainActivity extends AppCompatActivity implements SurfaceHolder.Callback {
        TextView textView;
        PreviewView mCameraView;
        SurfaceHolder holder;
        SurfaceView surfaceView;
        Canvas canvas;
        Paint paint;
        int cameraHeight, cameraWidth, xOffset, yOffset, boxWidth, boxHeight;
    
        private ListenableFuture<ProcessCameraProvider> cameraProviderFuture;
        private ExecutorService executor = Executors.newSingleThreadExecutor();
    
        /**
         *Responsible for converting the rotation degrees from CameraX into the one compatible with Firebase ML
         */
    
        private int degreesToFirebaseRotation(int degrees) {
            switch (degrees) {
                case 0:
                    return FirebaseVisionImageMetadata.ROTATION_0;
                case 90:
                    return FirebaseVisionImageMetadata.ROTATION_90;
                case 180:
                    return FirebaseVisionImageMetadata.ROTATION_180;
                case 270:
                    return FirebaseVisionImageMetadata.ROTATION_270;
                default:
                    throw new IllegalArgumentException(
                            "Rotation must be 0, 90, 180, or 270.");
            }
        }
    
    
        /**
         * Starting Camera
         */
        void startCamera(){
            mCameraView = findViewById(R.id.previewView);
    
            cameraProviderFuture = ProcessCameraProvider.getInstance(this);
    
            cameraProviderFuture.addListener(new Runnable() {
                @Override
                public void run() {
                    try {
                        ProcessCameraProvider cameraProvider = cameraProviderFuture.get();
                        MainActivity.this.bindPreview(cameraProvider);
                    } catch (ExecutionException | InterruptedException e) {
                        // No errors need to be handled for this Future.
                        // This should never be reached.
                    }
                }
            }, ContextCompat.getMainExecutor(this));
        }
    
        /**
         *
         * Binding to camera
         */
        private void bindPreview(ProcessCameraProvider cameraProvider) {
            Preview preview = new Preview.Builder()
                    .build();
    
            CameraSelector cameraSelector = new CameraSelector.Builder()
                    .requireLensFacing(CameraSelector.LENS_FACING_BACK)
                    .build();
    
            preview.setSurfaceProvider(mCameraView.createSurfaceProvider());
    
            //Image Analysis Function
            //Set static size according to your device or write a dynamic function for it
            ImageAnalysis imageAnalysis =
                    new ImageAnalysis.Builder()
                            .setTargetResolution(new Size(720, 1488))
                            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
                            .build();
    
    
            imageAnalysis.setAnalyzer(executor, new ImageAnalysis.Analyzer() {
                @SuppressLint("UnsafeExperimentalUsageError")
                @Override
                public void analyze(@NonNull ImageProxy image) {
                    //changing normal degrees into Firebase rotation
                    int rotationDegrees = degreesToFirebaseRotation(image.getImageInfo().getRotationDegrees());
                    if (image == null || image.getImage() == null) {
                        return;
                    }
                    //Getting a FirebaseVisionImage object using the Image object and rotationDegrees
                    final Image mediaImage = image.getImage();
                    FirebaseVisionImage images = FirebaseVisionImage.fromMediaImage(mediaImage, rotationDegrees);
                    //Getting bitmap from FirebaseVisionImage Object
                    Bitmap bmp=images.getBitmap();
                    //Getting the values for cropping
                    DisplayMetrics displaymetrics = new DisplayMetrics();
                    getWindowManager().getDefaultDisplay().getMetrics(displaymetrics);
                    int height = bmp.getHeight();
                    int width = bmp.getWidth();
    
                    int left, right, top, bottom, diameter;
    
                    diameter = width;
                    if (height < width) {
                        diameter = height;
                    }
    
                    int offset = (int) (0.05 * diameter);
                    diameter -= offset;
    
    
                    left = width / 2 - diameter / 3;
                    top = height / 2 - diameter / 3;
                    right = width / 2 + diameter / 3;
                    bottom = height / 2 + diameter / 3;
    
                    xOffset = left;
                    yOffset = top;
    
                    //Creating new cropped bitmap
                    Bitmap bitmap = Bitmap.createBitmap(bmp, left, top, boxWidth, boxHeight);
                    //initializing FirebaseVisionTextRecognizer object
                    FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
                            .getOnDeviceTextRecognizer();
                    //Passing FirebaseVisionImage Object created from the cropped bitmap
                    Task<FirebaseVisionText> result =  detector.processImage(FirebaseVisionImage.fromBitmap(bitmap))
                            .addOnSuccessListener(new OnSuccessListener<FirebaseVisionText>() {
                                @Override
                                public void onSuccess(FirebaseVisionText firebaseVisionText) {
                                    // Task completed successfully
                                    // ...
                                    textView=findViewById(R.id.text);
                                    //getting decoded text
                                    String text=firebaseVisionText.getText();
                                    //Setting the decoded text in the texttview
                                    textView.setText(text);
                                    //for getting blocks and line elements
                                    for (FirebaseVisionText.TextBlock block: firebaseVisionText.getTextBlocks()) {
                                        String blockText = block.getText();
                                        for (FirebaseVisionText.Line line: block.getLines()) {
                                            String lineText = line.getText();
                                            for (FirebaseVisionText.Element element: line.getElements()) {
                                                String elementText = element.getText();
    
                                            }
                                        }
                                    }
                                    image.close();
                                }
                            })
                            .addOnFailureListener(
                                    new OnFailureListener() {
                                        @Override
                                        public void onFailure(@NonNull Exception e) {
                                            // Task failed with an exception
                                            // ...
                                            Log.e("Error",e.toString());
                                            image.close();
                                        }
                                    });
                }
    
    
            });
            Camera camera = cameraProvider.bindToLifecycle((LifecycleOwner)this, cameraSelector, imageAnalysis,preview);
        }
    
    
        @Override
        protected void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.activity_main);
    
            //Start Camera
            startCamera();
    
            //Create the bounding box
            surfaceView = findViewById(R.id.overlay);
            surfaceView.setZOrderOnTop(true);
            holder = surfaceView.getHolder();
            holder.setFormat(PixelFormat.TRANSPARENT);
            holder.addCallback(this);
    
        }
    
        /**
         *
         * For drawing the rectangular box
         */
        private void DrawFocusRect(int color) {
            DisplayMetrics displaymetrics = new DisplayMetrics();
            getWindowManager().getDefaultDisplay().getMetrics(displaymetrics);
            int height = mCameraView.getHeight();
            int width = mCameraView.getWidth();
    
            //cameraHeight = height;
            //cameraWidth = width;
    
            int left, right, top, bottom, diameter;
    
            diameter = width;
            if (height < width) {
                diameter = height;
            }
    
            int offset = (int) (0.05 * diameter);
            diameter -= offset;
    
            canvas = holder.lockCanvas();
            canvas.drawColor(0, PorterDuff.Mode.CLEAR);
            //border's properties
            paint = new Paint();
            paint.setStyle(Paint.Style.STROKE);
            paint.setColor(color);
            paint.setStrokeWidth(5);
    
            left = width / 2 - diameter / 3;
            top = height / 2 - diameter / 3;
            right = width / 2 + diameter / 3;
            bottom = height / 2 + diameter / 3;
    
            xOffset = left;
            yOffset = top;
            boxHeight = bottom - top;
            boxWidth = right - left;
            //Changing the value of x in diameter/x will change the size of the box ; inversely proportionate to x
            canvas.drawRect(left, top, right, bottom, paint);
            holder.unlockCanvasAndPost(canvas);
        }
    
        /**
         * Callback functions for the surface Holder
         */
    
        @Override
        public void surfaceCreated(SurfaceHolder holder) {
    
        }
    
        @Override
        public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) {
            //Drawing rectangle
            DrawFocusRect(Color.parseColor("#b3dabb"));
        }
    
        @Override
        public void surfaceDestroyed(SurfaceHolder holder) {
    
        }
    }
    

    `

    Edit: I have found that you can use a png file with an image view instead of the surface view too. That may be cleaner and you can also integrate a customised layout for the users to superimpose on.

    Edit2: I have found that sending bitmap to image analyser can be inefficient ( was working on MLKit Barcode reader and it explicitly throws this warning in the logs) so what we can do is:

    imagePreview.setCropRect(r);
    

    where imagePreview is the ImageProxy image and r is the "android.graphics.Rect".