ios swift avplayer avplayerviewcontroller

How do I add gesture to AVPlayerController's subtitles?

plz help me to solve problem.. I can't find any question about this..

I'm going to add Subtitle in AVPlayerViewController.

I did it. But, My customer wanted to add gesture to subtitles.

for Instance,

click subtitle's some keyword
pass the "clicked keyword" to another Controller
just present anotherController with clikced keyword.

first... is it possible to add gesture to subtitles?

I didn't have lots of experience about coding...

I'm not very well in English.. sry..TT

Solution

안녕 Woo sung Kim! Welcome to the forum. Unfortunately, the short answer is no. There is no api to query the word at a specific tap location and the content isn't even rendered using a UILabel (or CATextLayer).

The longer answer is: If you reeeeeally want to do it there are options but I wouldn't use this for production. Checking out the view hierarchy of an AVPlayerViewController while displaying subtitles will show you that the subtitles are rendered within an FigFCRCALayerOutputNodeLayer. On a simulator running iOS 14.4 this layer is at avPlayerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first (this could change anytime in the future and probably doesn't even work below this version). The text is directly set as the layers content (including the rounded semi-transparent background view).

I played around with Vision for a bit and while it lags a little it somewhat works to feed the layers content (CGImage) to a text recognition request and then split the result into words and checking whether the touch location is within the bound of the words. To get the touch point I subclassed AVPlayerViewController (I know, it's bad, but if you use an AVPlayerLayer directly it's even easier) and converted the touch from touchesBegan to screen coordinates:

final class PlayerVC: AVPlayerViewController {
  var onTap: ((CGPoint) -> Void)?

  override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
    super.touchesBegan(touches, with: event)

    guard let tapLocation = touches.first?.location(in: nil) else { return }
    onTap?(tapLocation)
  }
}

The actual code to get a word at the location looks like this (playerVc points to a PlayerVC instance:

func tap(tapLocation: CGPoint) {
    guard
        let subtitleLayer = playerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first,
        CFGetTypeID(subtitleLayer.contents as CFTypeRef) == CGImage.typeID
    else { return }

    let image = subtitleLayer.contents as! CGImage
    let requestHandler = VNImageRequestHandler(cgImage: image)
    let recognizeTextHandler: (VNRequest, Error?) -> Void = { request, error in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }

        for observation in observations {
            guard let topCandidate = observation.topCandidates(1).first, topCandidate.string != "" else { continue }

            for word in topCandidate.string.components(separatedBy: " ") {
                guard let range = topCandidate.string.range(of: word) else { continue }

                if let boundinBox = try? topCandidate.boundingBox(for: range) {

                    let transform = CGAffineTransform.identity
                        .scaledBy(x: 1, y: -1)
                        .translatedBy(x: 0, y: -subtitleLayer.frame.size.height)
                        .scaledBy(x: subtitleLayer.frame.size.width, y: subtitleLayer.frame.size.height)

                    let convertedTopLeft = boundinBox.topLeft.applying(transform)
                    let convertedBottomRight = boundinBox.bottomRight.applying(transform)

                    let localRect = CGRect(x: convertedTopLeft.x,
                                                                 y: convertedTopLeft.y,
                                                                 width: convertedBottomRight.x - convertedTopLeft.x,
                                                                 height: convertedBottomRight.y - convertedTopLeft.y)

                    let globalRect = subtitleLayer.convert(localRect, to: nil)

                    if globalRect.contains(tapLocation) {
                        print("You tapped \(word)")
                    }
                }
            }
        }
    }

    let request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
    request.usesLanguageCorrection = false
    request.recognitionLevel = .accurate

    do {
        try requestHandler.perform([request])
    } catch {
        print("Unable to perform the request \(error)")
    }
}