Search code examples
javajaunt-api

How can I scrape data from a website using the Jaunt library?


I want to get the title from this website: http://feeds.foxnews.com/foxnews/latest

like this example:

<title><![CDATA[SUCCESSFUL INTERCEPT Pentagon confirms it shot down ICBM-type target]]></title>

and it will show text like this:

"SUCCESSFUL INTERCEPT Pentagon confirms it shot down ICBM-type target US conducts successful missile intercept test, Pentagon says"

Here's my code. I have used jaunt library.

I don't know why it shows text only "foxnew.com"

import com.jaunt.JauntException;
import com.jaunt.UserAgent;

public class p8_1
{

    public static void main(String[] args)
    {
        try
        {
            UserAgent userAgent = new UserAgent();
            userAgent.visit("http://feeds.foxnews.com/foxnews/latest"); 
            String title = userAgent.doc.findFirst
("<title><![CDATA[SUCCESSFUL INTERCEPT Pentagon confirms it shot down ICBM-type target]]></title>").getText();
              System.out.println("\n " + title); 


        } catch (JauntException e)
        {
            System.err.println(e);
        }

    }

}

Solution

  • Search for element types, not values.

    Try the following to get the title text of each item in the feed:

    public static void main(String[] args) {
        try {
            UserAgent userAgent = new UserAgent();
            userAgent.visit("http://feeds.foxnews.com/foxnews/latest");
    
            Elements items = userAgent.doc.findEach("<item>");
            Elements titles = items.findEach("<title>");
    
            for (Element title : titles) {
                String titleText = title.getComment(0).getText();
                System.out.println(titleText);
            }
        } catch (JauntException e) {
            System.err.println(e);
        }
    }