Search code examples
swiftuibounding-boxcgrectfirebase-mlkitgeometryreader

SwiftUI: Drawing rectangles around elements recognized with Firebase ML Kit


I am currently trying to achieve to draw boxes of the text that was recognized with Firebase ML Kit on top of the image. Currently, I did not have success yet and I can't see any box at all as they are all shown offscreen. I was looking at this article for a reference: https://medium.com/swlh/how-to-draw-bounding-boxes-with-swiftui-d93d1414eb00 and also at that project: https://github.com/firebase/quickstart-ios/blob/master/mlvision/MLVisionExample/ViewController.swift

This is the view where the boxes should be shown:

struct ImageScanned: View {
var image: UIImage
@Binding var rectangles: [CGRect]
@State var viewSize: CGSize = .zero

var body: some View {
    // TODO: fix scaling
       ZStack {
           Image(uiImage: image)
               .resizable()
               .scaledToFit()
               .overlay(
                   GeometryReader { geometry in
                    ZStack {
                        ForEach(self.transformRectangles(geometry: geometry)) { rect in
                            Rectangle()
                            .path(in: CGRect(
                                x: rect.x,
                                y: rect.y,
                                width: rect.width,
                                height: rect.height))
                            .stroke(Color.red, lineWidth: 2.0)
                        }
                    }
                }
           )
       }
}
private func transformRectangles(geometry: GeometryProxy) -> [DetectedRectangle] {
    var rectangles: [DetectedRectangle] = []

    let imageViewWidth = geometry.frame(in: .global).size.width
    let imageViewHeight = geometry.frame(in: .global).size.height
    let imageWidth = image.size.width
    let imageHeight = image.size.height

    let imageViewAspectRatio = imageViewWidth / imageViewHeight
    let imageAspectRatio = imageWidth / imageHeight
    let scale = (imageViewAspectRatio > imageAspectRatio)
      ? imageViewHeight / imageHeight : imageViewWidth / imageWidth

    let scaledImageWidth = imageWidth * scale
    let scaledImageHeight = imageHeight * scale
    let xValue = (imageViewWidth - scaledImageWidth) / CGFloat(2.0)
    let yValue = (imageViewHeight - scaledImageHeight) / CGFloat(2.0)

    var transform = CGAffineTransform.identity.translatedBy(x: xValue, y: yValue)
    transform = transform.scaledBy(x: scale, y: scale)

    for rect in self.rectangles {
        let rectangle = rect.applying(transform)
        rectangles.append(DetectedRectangle(width: rectangle.width, height: rectangle.height, x: rectangle.minX, y: rectangle.minY))
    }
    return rectangles
}

}

struct DetectedRectangle: Identifiable {
    var id = UUID()
    var width: CGFloat = 0
    var height: CGFloat = 0
    var x: CGFloat = 0
    var y: CGFloat = 0
}

This is the view where this view is nested in:

struct StartScanView: View {
@State var showCaptureImageView: Bool = false
@State var image: UIImage? = nil
@State var rectangles: [CGRect] = []

var body: some View {
    ZStack {
        if showCaptureImageView {
            CaptureImageView(isShown: $showCaptureImageView, image: $image)
        } else {
            VStack {

                Button(action: {
                    self.showCaptureImageView.toggle()
                }) {
                    Text("Start Scanning")
                }

                // show here View with rectangles on top of image
                if self.image != nil {
                    ImageScanned(image: self.image ?? UIImage(), rectangles: $rectangles)
                }


                Button(action: {
                    self.processImage()
                }) {
                    Text("Process Image")
                }
            }
        }
    }
}

func processImage() {
    let scaledImageProcessor = ScaledElementProcessor()
    if image != nil {
        scaledImageProcessor.process(in: image!) { text in
            for block in text.blocks {
                for line in block.lines {
                    for element in line.elements {
                        self.rectangles.append(element.frame)
                    }
                }
            }
        }
    }
}

}

The calculation of the tutorial caused the rectangles being to big and the one of the sample project them being too small. (Similar for height) Unfortunately I can't find on which size Firebase determines the element's size. This is how it looks like: enter image description here Without calculating the width & height at all, the rectangles seem to have about the size they are supposed to have (not exactly), so this gives me the assumption, that ML Kit's size calculation is not done in proportion to the image.size.height/width.


Solution

  • This is how i changed the foreach loop

    Image(uiImage: uiimage!).resizable().scaledToFit().overlay(
                         GeometryReader{ (geometry: GeometryProxy) in
                            ForEach(self.blocks , id: \.self){ (block:VisionTextBlock) in
                                Rectangle().path(in: block.frame.applying(self.transformMatrix(geometry: geometry, image: self.uiimage!))).stroke(Color.purple, lineWidth: 2.0)
                            }
                        }
    
                )
    

    Instead of passing the x, y, width and height, I am passing the return value from transformMatrix function to the path function.

    My transformMatrix function is

        private func transformMatrix(geometry:GeometryProxy, image:UIImage) -> CGAffineTransform {
    
          let imageViewWidth = geometry.size.width
          let imageViewHeight = geometry.size.height
          let imageWidth = image.size.width
          let imageHeight = image.size.height
    
          let imageViewAspectRatio = imageViewWidth / imageViewHeight
          let imageAspectRatio = imageWidth / imageHeight
          let scale = (imageViewAspectRatio > imageAspectRatio) ?
            imageViewHeight / imageHeight :
            imageViewWidth / imageWidth
    
          // Image view's `contentMode` is `scaleAspectFit`, which scales the image to fit the size of the
          // image view by maintaining the aspect ratio. Multiple by `scale` to get image's original size.
          let scaledImageWidth = imageWidth * scale
          let scaledImageHeight = imageHeight * scale
          let xValue = (imageViewWidth - scaledImageWidth) / CGFloat(2.0)
          let yValue = (imageViewHeight - scaledImageHeight) / CGFloat(2.0)
    
          var transform = CGAffineTransform.identity.translatedBy(x: xValue, y: yValue)
          transform = transform.scaledBy(x: scale, y: scale)
          return transform
        }
    }
    

    and the output is

    screenshot of working code in emulator