Search code examples
javahtmlweb-scrapingjsoupimdb

How to save an image from an HTML webpage with JSoup


I am trying to use JSoup to scrape the poster image from an IMDb link, and save so that it can be used by my program later. This is what I have so far:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JSoupTest
{

    public static void main(String[] args)
    {

        String address = "https://www.imdb.com/title/tt1270797/";
        try
        {
            Document doc = Jsoup.connect(address).get();
            Element link = doc.select().select();
        }
        catch (IOException e)
        {
            // Auto-generated catch block
            e.printStackTrace();
        }
    }

}

Now, I know the image is under a div class named "poster", but I cannot find out how to extract it. Please bear with me, as I have no prior experience with JSoup. Thanks a lot.


Solution

  • I've been using JSoup for awhile. But I've never tried to download an image from a HTML source.

    After getting document as you did above, you'll get the div you want, by using:

    Elements divs = doc.getElementsByClass("poster");
    

    The code above will return all Elements with 'poster' class.

    If you are sure there's only one div named 'poster' you can do:

    Element poster = divs.first();
    

    If you aren't sure of that, you'll need to find a way to differentiate that div from the others.

    Now, that you have your 'poster' div, you can get the link inside it, by doing:

    Elements image = poster.getElementsByTag("a");
    

    The code above will return all links inside 'poster' div. As we did above, if you're sure there's only one link inside 'poster' div, you can do:

    Element downloadImage = image.first();
    

    Now, you have the link for the image you want.