I am creating an web scraping for personal use in gaming. This is the website i am going to scrape: http://forum.toribash.com/clan_war.php?clanid=139
And i want to count the frequency of the name that appears on the "shows detail".
I have read this Get content from javascript onClick hyperlink without knowing that if this actually what i am searching for. I have a doubt that this is not what i am searching for, but regardless i have not try the answer of that questions since i have no idea on how to make this https://stackoverflow.com/a/12268561/10467473 fit to what i want.
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
//supposedly it holds every element that has player as it class inside of the button onclick
//But it doesn't work
Elements cek = e.getElementsByClass("player");
for(Element c : cek){
System.out.println(c.text());
}
}
For now i am expecting to get at least the name on show details table
Kaito
Chax
Draku
and so on
This page doesn't contain the information you want to scrape. Results are loaded by AJAX (Javascript) after the button is clicked. You can use your web browser's debugger to look on the Network tab to see what happens when you click the button. Clicking a button
<button id="buttonwarid19557" ... >
loads a table from URL:
http://forum.toribash.com/clan_war_ajax.php?warid=19557&clanid=139
Notice the same id number.
What you have to do is to get the id from every button, then GET another document for each of these buttons and parse it one by one. That's what your web browser does anyway.
BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
String mth = month.readLine();
//Accessing the website
Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();
//Taking every entry of war history
Elements collection = docs.getElementsByClass("war_history_entry");
//Itterate every collection
for(Element e : collection){
//if the info is on the exact month that are being searched we will use the e
if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
// selecting button
Element button = e.selectFirst("button");
// getting warid from button id
String buttonId = button.attr("id");
// removing text because we need only number
String warId = buttonId.replace("buttonwarid", "");
System.out.println("downloading results for " + e.getElementsByClass("war_info").text());
// downloading and parsing subpage containing table with info about single war
// adding referrer to make the request look more like it comes from the real web browser to avoid possible hotlinking protection
Document table = Jsoup.connect("http://forum.toribash.com/clan_war_ajax.php?warid=" + warId + "&clanid=139").referrer("http://forum.toribash.com/clan_war.php?clanid=139").get();
// get every <td class="player"> ... </td>
Elements players = table.select(".player");
for(Element player : players){
System.out.println(player.text());
}
}
}