Search code examples

Jsoup scraping image url results in data:image/gif;base64,

I'm starting to learn Jsoup and want to scrape Tesco webstore. Here is a link:

I want to get an image of a product. When I'm browsing the code of the page from Google Chrome I get something like this:

<img src="" alt="Tesco British
 Unsalted Butter 250G" class="product-image" 
768w, 4000w">

But my code:

Document doc = null;
        try {
            doc = Jsoup.connect("").get();
        } catch (IOException e) {

results in:

<a href="/groceries/en-GB/products/295626079" aria-hidden="true" class="product-image-wrapper" tabindex="-1">
 <div class="product-image__container">
  <img src="" alt="Sterling Blue Superkings 100 Pack" class="product-image">

I think the problem is that the URLs are loaded by JS and Jsoup is not supporting it. Is there any way to get the URL as I see it in chrome, or should I use more powerful tool such as HtmlUnit or Selenium.


  • So basically I've just switched to selenium. It may be slower, but at least the progress is going. I've also tried the HtmlUnit, but it seems to work badly with JS.