In the below example I'm trying to access each 'div.searchRcrd', the childs of 'content-area', but I'm lost in how I go about accessing them. I made a quick program trying to highlight my issue, using print statements to show it's not accessing the correct information. I've tried changing my doc.select to other variations such as ("div.content-area div.searchRcrd") with no avail.
I've looked all over stackoverflow before posting here, but I'm completely lost on this one. As always I appreciate any advice on where I'm going wrong on this.
public class Main
{
// retrieve page source code
Document doc = Jsoup.connect("https://uk.webuy.com/search/?categoryIds=1040&view=list&inStock=1").get();
// find all of the div rows in content-area
org.jsoup.select.Elements rows = doc.select("div.content-area div");
ListIterator<Element> itr = rows.listIterator();
// loop over each row
while (itr.hasNext())
{
Element row = itr.next();
System.out.println("Test"); //Prints out 5 times instead of the multiple I expect
}
}
To see the reason you need to print the whole HTML
page JSOUP
has loaded. You will notice that page looks different in web browser and what JSOUP
sees. It looks like you need to enable JavaScript
somehow and it will load the page properly using Ajax
requests.
Please, take a look on below link
Edit: But there is even better solution. You can notice that data are loaded in separate call. For example, browser does one extra call to show page you provided:
Try to download it and use Jackson
library to parse it.