Search code examples
iosswiftavplayeravplayerviewcontroller

How do I add gesture to AVPlayerController's subtitles?


plz help me to solve problem.. I can't find any question about this..

I'm going to add Subtitle in AVPlayerViewController.

I did it. But, My customer wanted to add gesture to subtitles.

for Instance,

  1. click subtitle's some keyword
  2. pass the "clicked keyword" to another Controller
  3. just present anotherController with clikced keyword.

first... is it possible to add gesture to subtitles?

I didn't have lots of experience about coding...

I'm not very well in English.. sry..TT


Solution

  • 안녕 Woo sung Kim! Welcome to the forum. Unfortunately, the short answer is no. There is no api to query the word at a specific tap location and the content isn't even rendered using a UILabel (or CATextLayer).

    The longer answer is: If you reeeeeally want to do it there are options but I wouldn't use this for production. Checking out the view hierarchy of an AVPlayerViewController while displaying subtitles will show you that the subtitles are rendered within an FigFCRCALayerOutputNodeLayer. On a simulator running iOS 14.4 this layer is at avPlayerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first (this could change anytime in the future and probably doesn't even work below this version). The text is directly set as the layers content (including the rounded semi-transparent background view).

    enter image description here

    I played around with Vision for a bit and while it lags a little it somewhat works to feed the layers content (CGImage) to a text recognition request and then split the result into words and checking whether the touch location is within the bound of the words. To get the touch point I subclassed AVPlayerViewController (I know, it's bad, but if you use an AVPlayerLayer directly it's even easier) and converted the touch from touchesBegan to screen coordinates:

    final class PlayerVC: AVPlayerViewController {
      var onTap: ((CGPoint) -> Void)?
    
      override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
        super.touchesBegan(touches, with: event)
    
        guard let tapLocation = touches.first?.location(in: nil) else { return }
        onTap?(tapLocation)
      }
    }
    

    The actual code to get a word at the location looks like this (playerVc points to a PlayerVC instance:

    func tap(tapLocation: CGPoint) {
        guard
            let subtitleLayer = playerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first,
            CFGetTypeID(subtitleLayer.contents as CFTypeRef) == CGImage.typeID
        else { return }
    
        let image = subtitleLayer.contents as! CGImage
        let requestHandler = VNImageRequestHandler(cgImage: image)
        let recognizeTextHandler: (VNRequest, Error?) -> Void = { request, error in
            guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
    
            for observation in observations {
                guard let topCandidate = observation.topCandidates(1).first, topCandidate.string != "" else { continue }
    
                for word in topCandidate.string.components(separatedBy: " ") {
                    guard let range = topCandidate.string.range(of: word) else { continue }
    
                    if let boundinBox = try? topCandidate.boundingBox(for: range) {
    
                        let transform = CGAffineTransform.identity
                            .scaledBy(x: 1, y: -1)
                            .translatedBy(x: 0, y: -subtitleLayer.frame.size.height)
                            .scaledBy(x: subtitleLayer.frame.size.width, y: subtitleLayer.frame.size.height)
    
                        let convertedTopLeft = boundinBox.topLeft.applying(transform)
                        let convertedBottomRight = boundinBox.bottomRight.applying(transform)
    
                        let localRect = CGRect(x: convertedTopLeft.x,
                                                                     y: convertedTopLeft.y,
                                                                     width: convertedBottomRight.x - convertedTopLeft.x,
                                                                     height: convertedBottomRight.y - convertedTopLeft.y)
    
                        let globalRect = subtitleLayer.convert(localRect, to: nil)
    
                        if globalRect.contains(tapLocation) {
                            print("You tapped \(word)")
                        }
                    }
                }
            }
        }
    
        let request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
        request.usesLanguageCorrection = false
        request.recognitionLevel = .accurate
    
        do {
            try requestHandler.perform([request])
        } catch {
            print("Unable to perform the request \(error)")
        }
    }