My object is to scrape data by using Java Selenium. I am able to load selenium driver, connect to the website and fetch the first column then go to the next pagination button until its become disable and write it to the console. Here is what I did so far:
public static WebDriver driver;
public static void main(String[] args) throws Exception {
System.setProperty("webdriver.chrome.driver", "E:\\eclipse-workspace\\package-name\\src\\working\\selenium\\driver\\chromedriver.exe");
System.setProperty("webdriver.chrome.silentOutput", "true");
driver = new ChromeDriver();
driver.get("https://datatables.net/examples/basic_init/zero_configuration.html");
driver.manage().window().maximize();
compareDispalyedRowCountToActualRowCount();
}
public static void compareDispalyedRowCountToActualRowCount() throws Exception {
try {
Thread.sleep(5000);
List<WebElement> namesElements = driver.findElements(By.cssSelector("#example>tbody>tr>td:nth-child(1)"));
System.out.println("size of names elements : " + namesElements.size());
List<String> names = new ArrayList<String>();
//Adding column1 elements to the list
for (WebElement nameEle : namesElements) {
names.add(nameEle.getText());
}
//Displaying the list elements on console
for (WebElement s : namesElements) {
System.out.println(s.getText());
}
//locating next button
String nextButtonClass = driver.findElement(By.id("example_next")).getAttribute("class");
//traversing through the table until the last button and adding names to the list defined about
while (!nextButtonClass.contains("disabled")) {
driver.findElement(By.id("example_next")).click();
Thread.sleep(1000);
namesElements = driver.findElements(By.cssSelector("#example>tbody>tr>td:nth-child(1)"));
for (WebElement nameEle : namesElements) {
names.add(nameEle.getText());
}
nextButtonClass = driver.findElement(By.id("example_next")).getAttribute("class");
}
//printing the whole list elements
for (String name : names) {
System.out.println(name);
}
//counting the size of the list
int actualCount = names.size();
System.out.println("Total number of names :" + actualCount);
//locating displayed count
String displayedCountString = driver.findElement(By.id("example_info")).getText().split(" ")[5];
int displayedCount = Integer.parseInt(displayedCountString);
System.out.println("Total Number of Displayed Names count:" + displayedCount);
Thread.sleep(1000);
// Actual count calculated Vs Dispalyed Count
if (actualCount == displayedCount) {
System.out.println("Actual row count = Displayed row Count");
} else {
System.out.println("Actual row count != Displayed row Count");
throw new Exception("Actual row count != Displayed row Count");
}
} catch (Exception e) {
e.printStackTrace();
}
}
I want to:
Update
I tried like this but not running:
for(WebElement trElement : tr_collection){
int col_num=1;
List<WebElement> td_collection = trElement.findElements(
By.xpath("//*[@id=\"example\"]/tbody/tr[rown_num]/td[col_num]")
);
for(WebElement tdElement : td_collection){
rows += tdElement.getText()+"\t";
col_num++;
}
rows = rows + "\n";
row_num++;
}
Scraping: Usually when I want to gather list elements I will select by Xpath instead of CssSelector. The structure of how to access elements through the Xpath is usually more clear, and depends on one or two integer values specifying the element.
So for your example where you want to find the names, you would find an element by the Xpath, the next element in the list's Xpath, and find the differing value:
The first name, 'Airi Satou' is found at the following Xpath:
//*[@id="example"]/tbody/tr[1]/td[1]
Airi's position has the following Xpath:
//*[@id="example"]/tbody/tr[1]/td[2]
You can see that across rows the Xpath for each piece of information differs on the 'td' markup.
The next name in the list, 'Angela Ramos' is found:
//*[@id="example"]/tbody/tr[2]/td[1]
And Angela's position is found:
//*[@id="example"]/tbody/tr[2]/td[2]
You can see that the difference in the column is controlled by the 'tr' markup.
By iterating over values of 'tr' and 'td' you can get the whole table.
As for writing to a CSV, there are a some solid Java libraries for writing to CSVs. I think a straightforward example to follow is here: Java - Writing strings to a CSV file
UPDATE: @User169 It looks like you're gathering a list of elements for each row in the table. You want to gather the Xpaths one by one, iterating over the list of webElements that you found originally. Try this, then add to it so it will get text and save it to an array.
for (int num_row = 1; num_row < total_rows; num_row++){
for (int num_col = 1; num_col < total_col; num_col++){
webElement info = driver.findElement(By.xpath("//*[@id=\"example\"]/tbody/tr[" + row_num + ']/td[' + col_num + "]");
}
}
I haven't tested it so it may need a few small changes.