Search code examples
javacsvjsoup

JSoup HTML Parse and Write Results to CSV In Order


I am trying to find the best approach to saving the data I have parsed out of the HTML document when Jsoup into a CSV. The problem I'm having is using [CSVWriter][1] - https://mvnrepository.com/artifact/com.opencsv/opencsv/4.6 and writing the data with it. Please see my code snippet below. The structure of the data looks like the following with infobox being the main listing record with each subsequent field within it. The CSVWriter looks like it is a String Array but having trouble going from elements to write to the CSVData writer with a String Array.

The Jsoup selector is returning an array of items from the selection. For instance, when I make the selection for the name, it is returning all 9 names if there are 9 records on the page. I need to put this data together in order for each row to print into a CSV.

InfoBox > Name| Email| Phone| Website

The problem I'm having is how I'm trying to write the data on this line below

writer.writeAll((Iterable<String[]>) infoArray);

This is not working correctly and errors but wanted to show what I am kind of after and if there is somebody who's familiar with writing data from Jsoup Elements into CSV. Thanks

String filePath ="c:/results.csv";
                // first create file object for file placed at location
                // specified by filepath
                File file = new File(filePath);
                try {
                    // create FileWriter object with file as parameter
                    FileWriter outputfile = new FileWriter(file);

                    // create CSVWriter object filewriter object as parameter
                    CSVWriter writer = new CSVWriter(outputfile);

                    String[] header = { "Name", "Phone", "Street","State","City","Zipcode" };
                    Elements infobox = doc.select(".info");
                    List<String> infoArray = new ArrayList<>();

                    for(int i = 0; i < infobox.size(); i++){

                        infobox.get(i).select(".business-name > span");

                        infoArray.add(infobox.get(i).select(".business-name > span").text());
                        infoArray.add(infobox.get(i).select(".phones.phone.primary").text());
                        infoArray.add(infobox.get(i).select(".street-address").text());
                        infoArray.add(infobox.get(i).select(".state").text());
                        infoArray.add(infobox.get(i).select(".city").text());
                        infoArray.add(infobox.get(i).select(".zip").text());


                    }


                    writer.writeNext(header);
                    //How to write data in order to match each record accordingly?
                    //Data should be written to CSV like the following example under each header into each corrosponding row
                    //name, phone, street
                    writer.writeAll((Iterable<String[]>) infoArray);
                    for(String ia : infoArray){


                    }

                    // closing writer connection
                    writer.close();
                }
                catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }

Solution

  • Here's what ended up working for me. The problem was not adding Strings into a String array to pass to CSVWriter. Here is my example.

       try {
    
    
                        String[] header = { "Name", "Phone", "Street","State","City","Zipcode" };
                        Elements infobox = doc.select(".info");
    
                        if(count == 0){
    
                            writer.writeNext(header);
                        }
    
                        for(int i = 0; i < infobox.size(); i++){
    
    
    
                            infobox.get(i).select(".business-name > span");
    
                            String businessName = infobox.get(i).select(".business-name > span").text();
                            String phone = infobox.get(i).select(".phones.phone.primary").text();
                            String address = infobox.get(i).select(".street-address").text();
                            //Address seems to be displayed another way too
                            String address2 = infobox.get(i).select(".adr").text();
                            //Use regular expression to normalize data
    
    
                            String[] columns = new String[]{
                                    businessName, phone, address
                            };
    
                            writer.writeNext(columns);
    
    
                        }
    
                        writer.close();
                    }