Search code examples
androidjsouphtml-parsing

Why img tag has no src value after parsing with jsoup?


I Want to get src value from html img tag . by chrome and inside of inspect element i can see value of src ,but when i parse it with jsoup library, src has no value , here's my code :

document = Jsoup.connect("http://estelam.rahvar120.ir/index.jsp? 
pageid=2371666&p=1").userAgent(USERAGENT).method(Connection.Method.GET)
.execute().parse();

Element element = document.select("img[id=capimg]").first(); //img 
tag element
String absoluteUrl = element.absUrl("src"); // absoluteUrl = ""
String srcValue = element.attr("src"); // srcValue = ""

the website isn't reachable from other countries, but where I want to parse from html is :

<img id="capimg" alt="Enter Captcha :" 
src="" width="200" height="60">

The Problem is that jsoup get html content right before javascript set src value, What Should I Do ?


Solution

  • Welcome to SO!

    The problem you are facing is not resolvable with Jsoup because Jsoup is a HTML parser not a browser. And since it's not a browser, any content rendered by javascript will not be rendered with Jsoup.

    What you need is another tool that simulates web browser such as Selenium

    There are multiple way to do this.

    1. Use Selenium to handle page retrieval and scraping.
    2. Use Selenium to get the dynamic pages and use JSoup to parse and scrape the content.

    I personally recommend 2nd approach because I am more comfortable using Jsoup to scrape.