Search code examples
javaweb-scrapingjsoup

jsoup does not find value of span id or class


I literally don't know how to describe my problem other than the fact that jsoup actively skips over the one value I need. I'm attempting to grab the value of average engagement/likes/comments on Instagram posts from a selected user; but let's just stick with engagement.

So far in my testing, I've seen it skip both values in <span id=... and also <span class=...

I have two versions of my code, neither of which provide any sort of helpful result. *Just as reference, this is what I can see when I inspect element the page: <span class="js-summary-whole-engagement">4,300</span> == $0 (https://analisa.io/profile/officialrickastley)

General:

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Code Ver 1.

String accountUsername = "officialrickastley";
String url = "https://analisa.io/profile/" + accountUsername;
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();

Elements engagement = doc.getElementsByClass("js-summary-whole-engagement");
System.out.println(engagement);  

The above outputs: <span class="js-summary-whole-engagement"><i class="fas fa-spinner fa-spin"></i></span> The latter half I believe to be irrelevant and I think appears later on down the page. But after the first half where I would expect the numbered value, it just doesn't have anything?

Code Ver 2.

String accountUsername = "officialrickastley";
String url = "https://analisa.io/profile/" + accountUsername;
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();

Elements engagement = doc.getElementsByClass("js-summary-whole-engagement");
System.out.println(engagement.text()); 

The above outputs nothing, not even a space or anything.

I've also tried something called doc.select and quite a few other things like .value, but nothing actually addresses the issue I'm having. I have also seen people parse the html directly from within the class, but if that is a possible solution, I'm unsure how to make the connection to the website and then store it to be parsed, since I want the data to update everyday.

Any help or suggestions would be greatly appreciated, thanks!


Solution

  • You could try this (read comments):

    try {
        String accountUsername = "officialrickastley";
        String url = "https://analisa.io/profile/" + accountUsername;
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36").get();
            
        // Get Name and Analisa Handle
        String keyWords = doc.select("meta[name=\"keywords\"]").first().attr("content"); 
        String[] contParts = keyWords.split(",\\s");
        String name = contParts[0];
        String handle = contParts[1];
    
        // Get desired stats:
        keyWords = doc.select("meta[property=\"og:description\"]").first().attr("content");  
    
        // Parse Stats:
        contParts = keyWords.split(",\\s");
            
        String engagmentRate = contParts[0].split("\\s+")[0];
        String avgLikes = contParts[1].split("\\s+")[0];
        String avgComments = contParts[2].split("\\s+")[0];
        String followers = contParts[3].split("\\s+")[0];
            
        // Display Stats:
        System.out.println("Name:           " + name + " (" + handle + ")"); 
        System.out.println("Engagment Rate: " + engagmentRate);
        System.out.println("Likes Rate:     " + avgLikes + "%");
        System.out.println("Comments Rate:  " + avgComments + "%");
        System.out.println("Followers:      " + followers);
    } catch (IOException ex) {
        // Handle exception whichever way you want, just don't leave it blank:
        System.err.println(ex);
    }
    

    The code above should output the following into the Console Window:

    Name:           Rick Astley (@officialrickastley Analisa)
    Engagment Rate: 2.44%
    Likes Rate:     2.37%
    Comments Rate:  0.07%
    Followers:      176,125