Search code examples
swiftapple-vision

How to track the barcode with highest confidence


I am using vision framework to detect barcodes. I want to show a rect around the barcode with highest confidence on live video, meaning, I want to track that rect to the barcode seen on the live preview.

So I have this code to detect the barcodes within a roi.

lazy var barcodeRequest: VNDetectBarcodesRequest = {
    let barcodeRequest = VNDetectBarcodesRequest {[weak self] request, error in
      guard error == nil else {
        print ("ERRO: \(error?.localizedDescription ?? "error")")
        return
      }
      self?.resultClassification(request)
    }
    barcodeRequest.regionOfInterest = CGRect(x: 0,
                                             y: 0.3,
                                             width: 1,
                                             height: 0.4)
    return barcodeRequest
  }()

This method will fire when the barcodes are detected

func resultClassification(_ request: VNRequest) {
    guard let barcodes = request.results,
          let potentialCodes = barcodes as? [VNBarcodeObservation]
    else { return }
    
    // choose the bar code with highestConfidence
    let highestConfidenceBarcodeDetected = potentialCodes.max(by: {$0.confidence < $1.confidence})
    
    // do something with highestConfidenceBarcodeDetected

    // 1
  }

This is my problem.

Now that I have the highest confidence barcode, I want to track it around the screen. So, I think I will have to add code at // 1.

But before that I have to define this for the tracker:

var inputObservation:VNDetectedObjectObservation!


lazy var barcodeTrackingRequest: VNTrackObjectRequest = {
  let barcodeTrackingRequest = VNTrackObjectRequest(detectedObjectObservation: inputObservation) { [weak self] request, error in
    guard error == nil else {
      print("Detection error: \(String(describing: error)).")
      return
    }
    self?.resultClassificationTracker(request)
  }
  return barcodeTrackingRequest
}()

func resultClassificationTracker(_ request:VNRequest) {
  // all I want from this is to store the boundingbox on a var  
}

Now, how do I connect these two pieces of code, so resultClassificationTracker fires every time I get a bounding box value for the tracker?


Solution

  • I did something similar a while ago and wrote an article on it. It's for VNRecognizeTextRequest not VNDetectBarcodesRequest, but it's similar. This is what I did:

    • Perform VNImageRequestHandler continuously (once it finishes, start another again)
    • Store the detection indicator view in a property var previousTrackingView: UIView?
    • Animate the detection indicator to the new rectangle whenever the request handler finishes
    • Use Core Motion to detect device movement, and adjust the frame of the detection indicator

    Here is the result:

    As you can see the height/y coordinate is not very accurate. My guess is that Vision only needs a horizontal line to scan barcodes - like those laser scanners in grocery stores - so it doesn't return the full height. But that is a different problem.

    Perform VNImageRequestHandler continuously (once it finishes, start another again)

    For this, I'm making a property busyPerformingVisionRequest, and whenever this is false, I call the Vision request. This is inside the didOutput function which gets called whenever the camera frame changes.

    
    class ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
    
        var busyPerformingVisionRequest = false
    
        func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
            guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
    
            if busyPerformingVisionRequest == false {
                lookForBarcodes(in: pixelBuffer) /// start the vision as many times as possible
            }
        }
    }
    

    Store the detection indicator view in a property var previousTrackingView: UIView?

    Below is my Vision handler that gets called when the Vision request completes. I first set busyPerformingVisionRequest to false, so another Vision request can be made. Then I convert the bounding box to screen coordinates and call self.drawTrackingView(at: convertedRect).

    func resultClassificationTracker(request: VNRequest?, error: Error?) {
        busyPerformingVisionRequest = false
        
        if let results = request?.results {
            if let observation = results.first as? VNBarcodeObservation {
                
                var x = observation.boundingBox.origin.x
                var y = 1 - observation.boundingBox.origin.y
                var height = CGFloat(0) /// ignore the bounding height
                var width = observation.boundingBox.width
                
                /// we're going to do some converting
                let convertedOriginalWidthOfBigImage = aspectRatioWidthOverHeight * deviceSize.height
                let offsetWidth = convertedOriginalWidthOfBigImage - deviceSize.width
                
                /// The pixelbuffer that we got Vision to process is bigger then the device's screen, so we need to adjust it
                let offHalf = offsetWidth / 2
                
                width *= convertedOriginalWidthOfBigImage
                height = width * (CGFloat(9) / CGFloat(16))
                x *= convertedOriginalWidthOfBigImage
                x -= offHalf
                y *= deviceSize.height
                y -= height
                
                let convertedRect = CGRect(x: x, y: y, width: width, height: height)
                
                DispatchQueue.main.async {
                    self.drawTrackingView(at: convertedRect)
                }
                
            }
        }
    }
    

    Animate the detection indicator to the new rectangle whenever the request handler finishes

    This is my function drawTrackingView. If there is a tracking rectangle view drawn already, it animates it to the new frame. If not, it just adds it as a subview.

    func drawTrackingView(at rect: CGRect) {
        if let previousTrackingView = previousTrackingView { /// already drawn one previously, just change the frame now
            UIView.animate(withDuration: 0.8) {
                previousTrackingView.frame = rect
            }
            
        } else { /// add it as a subview
            let trackingView = UIView(frame: rect)
            drawingView.addSubview(trackingView)
            trackingView.backgroundColor = UIColor.blue.withAlphaComponent(0.2)
            trackingView.layer.borderWidth = 3
            trackingView.layer.borderColor = UIColor.blue.cgColor
            
            
            previousTrackingView = trackingView
        }
    }
    

    Use Core Motion to detect device movement, and adjust the frame of the detection indicator

    I first store a couple motion-related properties. Then, in viewDidLoad, I start the motion updates.

    -----ViewController.swift-----
    
    /// motionManager will be what we'll use to get device motion
    var motionManager = CMMotionManager()
        
    /// this will be the "device’s true orientation in space" (Source: https://nshipster.com/cmdevicemotion/)
    var initialAttitude: CMAttitude?
         
    /// we'll later read these values to update the highlight's position
    var motionX = Double(0) /// aka Roll
    var motionY = Double(0) /// aka Pitch
    
    override func viewDidLayoutSubviews() {
        super.viewDidLayoutSubviews()
        
        /// viewDidLoad() is often too early to get the first initial attitude, so we use viewDidLayoutSubviews() instead
        if let currentAttitude = motionManager.deviceMotion?.attitude {
            /// we populate initialAttitude with the current attitude
            initialAttitude = currentAttitude
        }
        
    }
    override func viewDidLoad() {
        super.viewDidLoad()
        
        /// This is how often we will get device motion updates
        /// 0.03 is more than often enough and is about the rate that the video frame changes
        motionManager.deviceMotionUpdateInterval = 0.03
        
        motionManager.startDeviceMotionUpdates(to: .main) {
            [weak self] (data, error) in
            guard let data = data, error == nil else {
                return
            }
            
            /// This function will be called every 0.03 seconds
            self?.updateTrackingFrames(attitude: data.attitude)
        }
    
        ...
    }
    

    Every 0.03 seconds I will call updateTrackingFrames, which will read the new physical movement of the device. This is meant to be reduce jitter, like when your user's hands are shaking.

    func updateTrackingFrames(attitude: CMAttitude) {
        /// initialAttitude is an optional that points to the reference frame that the device started at
        /// we set this when the device lays out it's subviews on the first launch
        if let initAttitude = initialAttitude {
            
            /// We can now translate the current attitude to the reference frame
            attitude.multiply(byInverseOf: initAttitude)
            
            /// Roll is the movement of the phone left and right, Pitch is forwards and backwards
            let rollValue = attitude.roll.radiansToDegrees
            let pitchValue = attitude.pitch.radiansToDegrees
            
            /// This is a magic number, but for simplicity, we won't do any advanced trigonometry -- also, 3 works pretty well
            let conversion = Double(3)
            
            /// Here, we figure out how much the values changed by comparing against the previous values (motionX and motionY)
            let differenceInX = (rollValue - motionX) * conversion
            let differenceInY = (pitchValue - motionY) * conversion
            
            /// Now we adjust the tracking view's position
            if let previousTrackingView = previousTrackingView {
                previousTrackingView.frame.origin.x += CGFloat(differenceInX)
                previousTrackingView.frame.origin.y += CGFloat(differenceInY)
            }
            
            /// finally, we put the new attitude values into motionX and motionY so we can compare against them in 0.03 seconds (the next time this function is called)
            motionX = rollValue
            motionY = pitchValue
        }
    }
    

    This Core Motion implementation isn't very accurate - I hardcode the multiplier constant (Double(3)) that adjusts the frame of the tracking indicator. But it's enough to cancel out small jitter.

    Here is the final repo: https://github.com/aheze/BarcodeScanner