I am trying to parse the HTML document, and get from HTML, URL, and Text, for this task I am using library HTMLKit, for URL I am using the next code:
func parseHTML() {
browser.evaluateJavaScript("document.body.innerHTML") { (result, error) in
guard let html = result as? String, error == nil else {
print("Failed to get html string")
return
}
let document = HTMLDocument(string: html)
print("Create html doc")
let urls: [String] = document.querySelectorAll("div").compactMap({ element in
guard let src = element.attributes["href"] as? String else {
return nil
}
return src
})
print("Found \(urls.count) urls \n")
}
}
All work well, but I don't know how to parse text between
HTML code:
<div class="V7Sr0 p5AXld PpBGzd YcUVQe">What are the alternatives now that the Google web search API has been ...</div>
How I should modify the code if I want to get the text "What are the alternatives now that the Google web search API has been ..."
HTMLKit has property to get text between tag scopes - HTMLElement.textContent
Or you can use regex w/o HTMLKit. For example - (?<=>)(.*)(?=<)