I'm trying to get content inside a script tag (JSON data) from a recipe in an HTML page, using JSoup (1.13.1). I won't post the HTML code but the script tag content is pretty big.
Whenever I try to print the content, I get an empty string. I tried to get my data using different methods: by selecting the ID doc.select("#__NEXT_DATA__")
, or by using doc.select("script[type='application/json']")
If I try to iterate through all the script tags, whenever it gets to the script tag I want, it prints blank.
I also tried to print the content using text()
method and the toString()
method but it doesn't work. I even saw someone saying you could set the maxBodySize(0)
but it still doesn't work.
Here is my code:
String url = "https://www.marmiton.org/recettes/recette_gateau-au-chocolat-fondant-rapide_166352.aspx";
doc = Jsoup.connect(url).maxBodySize(0).get();
Elements newsHeadlines = doc.select("#__NEXT_DATA__");
for (Element element : newsHeadlines) {
System.out.println(element);
}
Treat the script element as data:
Elements newsHeadlines = doc.select("#__NEXT_DATA__");
for (Element element : newsHeadlines) {
System.out.println(element.data());
}
Note that some consoles may have an issue displaying a line of 81206 characters in length (eclipse did for me) (or there was something in the data) so this code simply prints out the beginning...
for (Element element : newsHeadlines) {
System.out.println(element.data().length());
int printLen = Math.min(100, element.data().length());
System.out.println(element.data().substring(0,printLen));
}
And produces:
81206
{"props":{"pageProps":{"recipeData":{"recipe":{"id":166352,"guid":"7bf48b95-4cd2-4b32-8f41-fb6168510
Note if you can use a debugger in your environment it would show that the element had the result all along but as a childNode
of element
of type DataNode
which is the first clue.