Search code examples
javahtmljsoup

Java - How do I access a child of Div using JSoup


In the below example I'm trying to access each 'div.searchRcrd', the childs of 'content-area', but I'm lost in how I go about accessing them. I made a quick program trying to highlight my issue, using print statements to show it's not accessing the correct information. I've tried changing my doc.select to other variations such as ("div.content-area div.searchRcrd") with no avail.

I've looked all over stackoverflow before posting here, but I'm completely lost on this one. As always I appreciate any advice on where I'm going wrong on this.

public class Main 
{
    // retrieve page source code
    Document doc = Jsoup.connect("https://uk.webuy.com/search/?categoryIds=1040&view=list&inStock=1").get();

    // find all of the div rows in content-area
    org.jsoup.select.Elements rows = doc.select("div.content-area div");
    ListIterator<Element> itr = rows.listIterator();

    // loop over each row
    while (itr.hasNext()) 
    {
        Element row = itr.next();
        System.out.println("Test"); //Prints out 5 times instead of the multiple I expect

    }
}

enter image description here


Solution

  • To see the reason you need to print the whole HTML page JSOUP has loaded. You will notice that page looks different in web browser and what JSOUP sees. It looks like you need to enable JavaScript somehow and it will load the page properly using Ajax requests.

    Please, take a look on below link

    1. Page content is loaded with JavaScript and Jsoup doesn't see it

    Edit: But there is even better solution. You can notice that data are loaded in separate call. For example, browser does one extra call to show page you provided:

    https://wss2.cex.uk.webuy.io/v3/boxes?inStock=1&categoryIds=[1040]&firstRecord=1&count=50&sortBy=relevance&sortOrder=desc

    Try to download it and use Jackson library to parse it.