Search code examples
jsoup

Jsoup selector parse element 's href and title


HTML from Gutenberg:

<li class="booklink">
    <a class="table link" href="/ebooks/4300.mobile" accesskey="5">
        <span class="row">
            <span class="cell leftcell">
                <span class="icon icon_book"></span>
            </span>
            <span class="cell content">
                <span class="title">Ulysses</span>
                <span class="subtitle">James Joyce</span>
                <span class="extra">7824 downloads</span>
            </span>
            <span class="cell rightcell">
                <span class="icon icon_next"></span>
            </span>
        </span>
    </a>
</li>

html from gutenberg

I want to parse HTML like this and get the href link and title using JSoup.

I tried many ways but never succeeded.


Solution

  • Try this:

    import java.io.IOException;
    import java.util.List;
    
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    
    public class BookScraper {
    
        public static void main(String[] args) throws IOException {
    
            Document document = Jsoup.connect("https://m.gutenberg.org/ebooks/search.mobile/?query=ulysses").get();
            List<Element> bookLinks = document.select("body > div.content > ol > li[class=booklink]");
    
            for (Element bookLink : bookLinks) {
    
                String href = bookLink.select(".table.link").get(0).absUrl("href");
                String title = bookLink.select(".cell.content .title").text();
                String subTitle = bookLink.select(".cell.content .subtitle").text();
                String extra = bookLink.select(".cell.content .extra").text();
    
                System.out.println("Link : " + href);
                System.out.println("    Title    : " + title);
                System.out.println("    Subtitle : " + subTitle);
                System.out.println("    Info     : " + extra);
            }
        }
    
    }
    

    Sample Output:

    Link : https://m.gutenberg.org/ebooks/4300.mobile
        Title    : Ulysses
        Subtitle : James Joyce
        Info     : 7824 downloads
    Link : https://m.gutenberg.org/ebooks/4367.mobile
        Title    : Personal Memoirs of U. S. Grant, Complete
        Subtitle : Ulysses S. Grant
        Info     : 1459 downloads
    Link : https://m.gutenberg.org/ebooks/20151.mobile
        Title    : Hidden Treasures; Or, Why Some Succeed While Others Fail
        Subtitle : Harry A. Lewis
        Info     : 199 downloads
    Link : https://m.gutenberg.org/ebooks/32884.mobile
        Title    : Ideas of Good and Evil
        Subtitle : W. B. Yeats
        Info     : 143 downloads
    Link : https://m.gutenberg.org/ebooks/35742.mobile
        Title    : American Leaders and Heroes: A preliminary text-book in United States History
        Subtitle : Wilbur F. Gordy
        Info     : 143 downloads
    Link : https://m.gutenberg.org/ebooks/32326.mobile
        Title    : Tales of Troy and Greece
        Subtitle : Andrew Lang
        Info     : 118 downloads
    Link : https://m.gutenberg.org/ebooks/7768.mobile
        Title    : The Adventures of Ulysses
        Subtitle : Charles Lamb
        Info     : 108 downloads
    Link : https://m.gutenberg.org/ebooks/11490.mobile
        Title    : American Negro Slavery
        Subtitle : Ulrich Bonnell Phillips
        Info     : 102 downloads
    Link : https://m.gutenberg.org/ebooks/17667.mobile
        Title    : Dialogues of the Dead
        Subtitle : Baron George Lyttelton Lyttelton and Mrs. Montagu
        Info     : 98 downloads
    Link : https://m.gutenberg.org/ebooks/2851.mobile
        Title    : Sixes and Sevens
        Subtitle : O. Henry
        Info     : 97 downloads
    Link : https://m.gutenberg.org/ebooks/32728.mobile
        Title    : The English in the West Indies; Or, The Bow of Ulysses
        Subtitle : James Anthony Froude
        Info     : 69 downloads
    Link : https://m.gutenberg.org/ebooks/41935.mobile
        Title    : The Adventures of Ulysses the Wanderer
        Subtitle : Homer and Guy Thorne
        Info     : 67 downloads
    Link : https://m.gutenberg.org/ebooks/32628.mobile
        Title    : The Child's Book of American Biography
        Subtitle : Mary Stoyell Stimpson
        Info     : 63 downloads
    Link : https://m.gutenberg.org/ebooks/29659.mobile
        Title    : Manual of American Grape-Growing
        Subtitle : U. P. Hedrick
        Info     : 54 downloads
    Link : https://m.gutenberg.org/ebooks/46327.mobile
        Title    : The Cherries of New York
        Subtitle : U. P. Hedrick
        Info     : 47 downloads
    Link : https://m.gutenberg.org/ebooks/5860.mobile
        Title    : Personal Memoirs of U. S. Grant, Part 1.
        Subtitle : Ulysses S. Grant
        Info     : 46 downloads
    Link : https://m.gutenberg.org/ebooks/51076.mobile
        Title    : Aaron Rodd, Diviner
        Subtitle : E. Phillips Oppenheim
        Info     : 34 downloads
    Link : https://m.gutenberg.org/ebooks/45978.mobile
        Title    : The Grapes of New York
        Subtitle : U. P. Hedrick
        Info     : 33 downloads
    Link : https://m.gutenberg.org/ebooks/46347.mobile
        Title    : Men of Our Times; Or, Leading Patriots of the Day
        Subtitle : Harriet Beecher Stowe
        Info     : 31 downloads
    Link : https://m.gutenberg.org/ebooks/4546.mobile
        Title    : Memoirs of the Union's Three Great Civil War Generals
        Subtitle : Ulysses S. Grant, William T. Sherman, and Philip Henry Sheridan
        Info     : 30 downloads
    Link : https://m.gutenberg.org/ebooks/47263.mobile
        Title    : The Peaches of New York
        Subtitle : U. P. Hedrick
        Info     : 30 downloads
    Link : https://m.gutenberg.org/ebooks/39626.mobile
        Title    : An Alphabet of History
        Subtitle : Wilbur D. Nesbit
        Info     : 28 downloads
    Link : https://m.gutenberg.org/ebooks/46994.mobile
        Title    : The Pears of New York
        Subtitle : U. P. Hedrick
        Info     : 27 downloads
    Link : https://m.gutenberg.org/ebooks/43982.mobile
        Title    : Stories of the Old World
        Subtitle : Alfred John Church
        Info     : 26 downloads
    Link : https://m.gutenberg.org/ebooks/28386.mobile
        Title    : Ulysses S. Grant
        Subtitle : Walter Allen
        Info     : 25 downloads