Search code examples
javacsvparsingjavabeans

Parsing entire csv file vs parsing line by line in java


I have somewhat of a larger csv file approximately 80K to 120K rows (depending on the day). I'm successfully running the code which parses the entire csv file into a java object using @CsvBindByName annotation. Sample code:

Reader reader = Files.newBufferedReader(Paths.get(file));
    CsvToBean csvToBean = new CsvToBeanBuilder<Object>(reader)
            .withType(MyCustomClass.class)
            .withIgnoreLeadingWhiteSpace(true)
            .build(); 
    List<MyCustomClass> myCustomClass= csvToBean.parse();`

I want to change this code to parse the csv file line by line instead of entire file but retain the neatness of mapping to java bean object. Essentially something like this:

    CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(csvFileLoc)));
    String[] headerRow = csvReader.readNext(); // save the headerRow
    String [] nextLine = null;
    MyCustomClass myCustomClass = new MyCustomClass(); 
    while ((nextLine = csvReader.readNext())!=null) {
                    myCustomClass.setField1(nextLine[0]);
                    myCustomClass.setField2(nextLine[1]);
                    //.... so on 
                }

But the above solution ties me to knowing the column positions for each field. What I would like is to map the string array I get from csv based on the header row similar to what opencsv does while parsing the entire csv file. However, I am not able to do that using opencsv, as far as I can tell. I had assumed this would be a pretty common practice but I am unable to find any references to this online. It could be that I am not understanding the CsvToBean usage correctly for opencsv library. I could use csvToBean.iterator to iterate over the beans but I think entire csv file is loaded in memory with the build method, which kind of defeats the purpose of reading line by line. Any suggestions welcome


Solution

  • Looking at the API docs further, I see that CsvToBean<T> implements Iterable<T> and has an iterator() method that returns an Iterator<T> that is documented as follows:

    The iterator returned by this method takes one line of input at a time and returns one bean at a time.

    So it looks like you could just write your loop as:

    for (MyCustomClass myCustomClass : csvToBean) {
        // . . . do something with the bean . . .
    }
    

    Just to clear up some potential confusion, you can see in the source code that the build() method of CsvToBeanBuilder just creates the CsvToBean object, and doesn't do the actual input, and that the parse() method and the iterator of the CsvToBean object each do perform input.