I am trying to make java web scraper but I lost somewhere in the code What is supposed to be done to just extract Name ,Email and phone no form the given web page and export into plain text.
I am using jsoup
library in this too .
But cant able to figure it out how I can achieve this task
This is how my codes look like.
package javaapplication6;
import org.jsoup.*;
import org.jsoup.helper.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
import java.io.*; // Only needed if scraping a local File.
public class javaapplication6 {
public javaapplication6() {
Document doc = null;
try {
doc = Jsoup.connect("http://cs.qau.edu.pk/faculty.php/").get();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Elements table = doc.getElementsByClass("tbl");
Elements rows = table.getElementsByTag("TR");
for (Element row : rows) {
Elements tds = row.getElementsByTag("TD");
for (int i = 0; i < tds.size(); i++) {
if (i == 1) System.out.println(tds.get(i).text());
}
}
}
public static void main (String args[]) {
new javaapplication6();
}
}
Try out the following code for Name & details,
The connect(String URL) method creates a new Connection, and get() fetches and parses the HTML file.
Then we find matching elements, then we use the foreach loop to retrieve name & details.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class CsDotQauDotEduPk {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://cs.qau.edu.pk/faculty.php/").get();
Elements ele = doc.select("div.container td");
for (Element e : ele) {
Elements el = e.select("strong").eq(0);
Elements e2 = e.select("td p").eq(1);
final String name = el.text();
final String details = e2.text();
System.out.println(name+" >> "+ details);
}}}
Output:
Dr. Onaiza Maqbool >> Email: [email protected] Phone: +92-51-9064 2060
Dr. Khalid Saleem >> Email: [email protected] Phone: +92-51-9064 2050
Dr. Shuaib Karim >> Email: [email protected] Phone: +92-51-9064 2055
Dr. Rabeeh Ayaz Abbasi >> Email: [email protected] Phone: +92-51-9064 2050
Dr. Ghazanfar Farooq >> Assistant Professor
Dr. Muddassar Azam Sindhu >> Email: [email protected] Phone: +92-51-9064 2066
Dr. Akmal Saeed Khattak >> Email: [email protected] Phone: +92-51-9064 2161
Dr. Muhammad Aasim Rafique >>
Dr. Umer Rasheed >>
Memoona Afsheen Malik >> Email: [email protected] Phone: +92-51-9064 2064
Ifrah Farrukh Khan >> Email: ([email protected]) Phone: +92-51-9064 2005
S. M. Naqi >> Email: [email protected] Phone: +92-51-9064 2059