I have presistent problems with parsing an HTML page for long tagnames with Jsoup.
In my case, I'm trying to extract the <ytd-video-renderer>
elements from a youtube search page. However many ways I try it. No reliable, or non-empty, list is returned.
Things I've tried so far, each for the HTML Document object doc
.
.select("ytd-video-renderer")
(To no avail, list is empty.)
.getElementsByClass("ytd-item-section-renderer")
(A class only occuring in ytd-video-renderer
)
.select("ytd-video-renderer.ytd-item-section-renderer")
.select("ytd-video-renderer[class*=ytd-item-section-renderer]")
.select("div#dismissable)
(the sole div under ytd-video-renderer
)
And a lot more with parameters...
I also gave any other tags a shot but I get the same problems.
the closest I've come to succes was: .select(a[href*=watch])
. This returns all video titles. But sadly also some other links with other text. Thus not reliable.
I have Java 8 installed and the latest version of Jsoup.
Here is the code implementing Jsoup:
public class SearchPage {
private Document doc;
public SearchPage(String url){
try {
doc = Jsoup.connect(url).get();
} catch (IOException ex) {
//taking care of my error cats
}
}
public Elements test(){ //just to test
return doc.getElementsByTag("ytd-item-renderer");
}
}
Example of what I try to extract: An image of HTML blocks that I'm looking for
It seems as if I'm missing something as many say Jsoup is awesome and easy... (not in my case then :v)
What I'd like to see is every element I ask for in a list. Next I want to parse each element again but let's solve this first. Hopefully that will give me the knowhow to solve the rest. Right now I get an empty list every single time.
Thank you very much.
The contents of the Youtube search page that you are looking at are rendered by your browser via Javascript. The line Jsoup.connect(url).get()
will only get the HTML content of the page, it will not execute any Javascript. If you request the page with cURL or some other command line tool, you will find the elements you are looking for are not there.
I'm not sure exactly what your goal is, but you may want to take a look into the YouTube API to see if there is an easier way to do what you want.