Hi i need to scrape a web site using JSOUP and i needed to get output in key- value pairs can anyone suggest me.
The url which i need to scrape is https://www.cpsc.gov/Recalls?field_rc_date_value%5Bmin%5D&field_rc_date_value%5Bmax%5D&field_rc_heading_value=&field_rc_hazard_description_value=&field_rc_manufactured_in_value=&field_rc_manufacturers_value=&field_rc_number_value=
The code which i written is:
package com.jaysons;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ScrapeBody {
public static void main( String[] args ) throws IOException{
String url = "https://www.cpsc.gov/Recalls?field_rc_date_value%5Bmin%5D&field_rc_date_value%5Bmax%5D&field_rc_heading_value=&field_rc_hazard_description_value=&field_rc_manufactured_in_value=&field_rc_manufacturers_value=&field_rc_number_value=";
Document doc = Jsoup.connect(url).get();
Elements content = doc.select("div.views-field views-field-php");
doc = Jsoup.parse( content.html().replaceAll("</div>", "</div><span>")
.replaceAll("<div", "</span><div") );
Elements labels = doc.select("div.remedy");
for (Element label : labels) {
System.out.println(String.format("%s %s", label.text().trim(),
label.nextElementSibling().text()));
}
}
}
i need output in key value pairs like
Date:OCTOBER 20, 2017
remedy:
units:
website:http://www.bosch-home.com/us
phone:(888) 965-5813
kindly let me know where did i do mistake
Theres no need to reassign and re-parse the value of the content
variable.
Elements content = doc.select("div.views-field >span");
for (Element viewField : content) {
/*
each viewField corresponds to one
<div class="views-field views-field-php">
<span class="field-content">
<a href="/Recalls/2018/BSH-Home-Appliances-amplía-retiro-del-mercado-de-lavavajillas">
<div class="date">
October 20, 2017
</div>
...
</span>
</div>
*/
Elements divs = viewField.getElementsByTag("div");
for (Element div : divs) {
String className = div.className();
if (className.equals("date")) {
// store and extract date
} else if (className.equals("...")) {
// do something else
} // else...
}
}
Not only you can select subelements by tag, but also by name, by some attributes etc. Check the official documentation for more info: https://jsoup.org/cookbook/extracting-data/dom-navigation
Disclaimer: I could not test the code right now.