Search code examples
javaweb-scrapingjsoup

Trying to make Java Web Scraper


I am trying to make java web scraper but I lost somewhere in the code What is supposed to be done to just extract Name ,Email and phone no form the given web page and export into plain text.

I am using jsoup library in this too .

But cant able to figure it out how I can achieve this task

This is how my codes look like.

package javaapplication6;
import org.jsoup.*;
import org.jsoup.helper.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;

import java.io.*; // Only needed if scraping a local File.

public class javaapplication6 {


    public javaapplication6() {

        Document doc = null;

        try {
            doc = Jsoup.connect("http://cs.qau.edu.pk/faculty.php/").get();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
                
        Elements table = doc.getElementsByClass("tbl");
        Elements rows = table.getElementsByTag("TR");
        
        for (Element row : rows) {
            Elements tds = row.getElementsByTag("TD");
            for (int i = 0; i < tds.size(); i++) {
                if (i == 1) System.out.println(tds.get(i).text());
            }
        }
    
    }
    
    public static void main (String args[]) {

        new javaapplication6();
    
    }
    
}

Solution

  • Try out the following code for Name & details,

    The connect(String URL) method creates a new Connection, and get() fetches and parses the HTML file.

    Then we find matching elements, then we use the foreach loop to retrieve name & details.

    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    public class CsDotQauDotEduPk {
    
    public static void main(String[] args) throws Exception {
    
        Document doc = Jsoup.connect("http://cs.qau.edu.pk/faculty.php/").get();
    
        Elements ele = doc.select("div.container td");
        for (Element e : ele) {
    
            Elements el = e.select("strong").eq(0);
            Elements e2 = e.select("td p").eq(1);
    
            final String name = el.text();
            final String details = e2.text();
    
            System.out.println(name+" >> "+ details);
        }}}
    

    Output:

    Dr. Onaiza Maqbool >> Email: [email protected] Phone: +92-51-9064 2060

    Dr. Khalid Saleem >> Email: [email protected] Phone: +92-51-9064 2050

    Dr. Shuaib Karim >> Email: [email protected] Phone: +92-51-9064 2055

    Dr. Rabeeh Ayaz Abbasi >> Email: [email protected] Phone: +92-51-9064 2050

    Dr. Ghazanfar Farooq >> Assistant Professor

    Dr. Muddassar Azam Sindhu >> Email: [email protected] Phone: +92-51-9064 2066

    Dr. Akmal Saeed Khattak >> Email: [email protected] Phone: +92-51-9064 2161

    Dr. Muhammad Aasim Rafique >>

    Dr. Umer Rasheed >>

    Memoona Afsheen Malik >> Email: [email protected] Phone: +92-51-9064 2064

    Ifrah Farrukh Khan >> Email: ([email protected]) Phone: +92-51-9064 2005

    S. M. Naqi >> Email: [email protected] Phone: +92-51-9064 2059