plz help me to solve problem.. I can't find any question about this..
I'm going to add Subtitle in AVPlayerViewController.
I did it. But, My customer wanted to add gesture to subtitles.
for Instance,
first... is it possible to add gesture to subtitles?
I didn't have lots of experience about coding...
I'm not very well in English.. sry..TT
안녕 Woo sung Kim! Welcome to the forum. Unfortunately, the short answer is no. There is no api to query the word at a specific tap location and the content isn't even rendered using a UILabel
(or CATextLayer
).
The longer answer is: If you reeeeeally want to do it there are options but I wouldn't use this for production. Checking out the view hierarchy of an AVPlayerViewController
while displaying subtitles will show you that the subtitles are rendered within an FigFCRCALayerOutputNodeLayer
. On a simulator running iOS 14.4 this layer is at
avPlayerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first
(this could change anytime in the future and probably doesn't even work below this version). The text is directly set as the layers content
(including the rounded semi-transparent background view).
I played around with Vision for a bit and while it lags a little it somewhat works to feed the layers content (CGImage) to a text recognition request and then split the result into words and checking whether the touch location is within the bound of the words. To get the touch point I subclassed AVPlayerViewController
(I know, it's bad, but if you use an AVPlayerLayer
directly it's even easier) and converted the touch from touchesBegan
to screen coordinates:
final class PlayerVC: AVPlayerViewController {
var onTap: ((CGPoint) -> Void)?
override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
super.touchesBegan(touches, with: event)
guard let tapLocation = touches.first?.location(in: nil) else { return }
onTap?(tapLocation)
}
}
The actual code to get a word at the location looks like this (playerVc
points to a PlayerVC
instance:
func tap(tapLocation: CGPoint) {
guard
let subtitleLayer = playerVc.view.subviews.first?.subviews.first?.subviews.first?.layer.sublayers?.first?.sublayers?[1].sublayers?.first?.sublayers?.first?.sublayers?.first,
CFGetTypeID(subtitleLayer.contents as CFTypeRef) == CGImage.typeID
else { return }
let image = subtitleLayer.contents as! CGImage
let requestHandler = VNImageRequestHandler(cgImage: image)
let recognizeTextHandler: (VNRequest, Error?) -> Void = { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for observation in observations {
guard let topCandidate = observation.topCandidates(1).first, topCandidate.string != "" else { continue }
for word in topCandidate.string.components(separatedBy: " ") {
guard let range = topCandidate.string.range(of: word) else { continue }
if let boundinBox = try? topCandidate.boundingBox(for: range) {
let transform = CGAffineTransform.identity
.scaledBy(x: 1, y: -1)
.translatedBy(x: 0, y: -subtitleLayer.frame.size.height)
.scaledBy(x: subtitleLayer.frame.size.width, y: subtitleLayer.frame.size.height)
let convertedTopLeft = boundinBox.topLeft.applying(transform)
let convertedBottomRight = boundinBox.bottomRight.applying(transform)
let localRect = CGRect(x: convertedTopLeft.x,
y: convertedTopLeft.y,
width: convertedBottomRight.x - convertedTopLeft.x,
height: convertedBottomRight.y - convertedTopLeft.y)
let globalRect = subtitleLayer.convert(localRect, to: nil)
if globalRect.contains(tapLocation) {
print("You tapped \(word)")
}
}
}
}
}
let request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
request.usesLanguageCorrection = false
request.recognitionLevel = .accurate
do {
try requestHandler.perform([request])
} catch {
print("Unable to perform the request \(error)")
}
}