Search code examples
javaopencsv

Manipulating BufferedReader before it is read by OpenCSV's CSVReaderBuilder results in CSVReaderBuilder = null


I am reading a CSV file OpenCSV's CSVReaderBuilder which doesn't work as the CSV file for some weird reason I cannot change has some lines with a missing column.

So I thought it would be a good idea to manipulate the BufferedReader I use as input for the CSVReaderBuilder and add an extra column before it is read by CSVReaderBuilder but unfortunately the CSVReaderBuilder will always return null.

This code results in an com.opencsv.exceptions.CsvRequiredFieldEmptyException as the lines have different number of columns, but works with a proper CSV file:

        FileInputStream is;
        try {
            is = new FileInputStream(fileName);
            InputStreamReader isr = new InputStreamReader(is, charSet);
            BufferedReader buffReader = new BufferedReader(isr);

            // use own CSVParser to set separator
            final CSVParser parser = new CSVParserBuilder()
                    .withSeparator(separator)
                    .build();

            // use own CSVReader make use of own CSVParser
            reader = new CSVReaderBuilder(buffReader)
                    .withCSVParser(parser)
                    .build();


        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

So I added the code to manipulate the BufferedReader to add an extra semicolon if the column count is 13 instead of 14, but this will result in reader being null.

        FileInputStream is;
        try {
            is = new FileInputStream(fileName);
            InputStreamReader isr = new InputStreamReader(is, charSet);
            BufferedReader buffReader = new BufferedReader(isr);

            buffReader.lines().forEach(t -> {
                String a[] = t.split(";");
                int occurence = a.length;

                if(occurence == 13) {
                    t = t.concat(";");
                }
            });         

            // use own CSVParser to set separator
            final CSVParser parser = new CSVParserBuilder()
                    .withSeparator(separator)
                    .build();

            // use own CSVReader make use of own CSVParser
            reader = new CSVReaderBuilder(buffReader)
                    .withCSVParser(parser)
                    .build();


        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

Does anyone have an idea what I'm doing wrong here?


Solution

  • There are a couple of problems here:

    First, by the time buffReader is used in new CSVReaderBuilder(buffReader), it has already been fully consumed by buffReader.lines().forEach. A BufferedReader can only be read once, in general. A solution could ordinarily be to create a new InputStreamReader and BufferedReader on the same file, except in this case, you'll run into the second problem.

    The line t = t.concat(";"); does not work the way you expect. All this does is reassign the local variable t, which isn't used again. It does not change the contents of the file or the contents of the reader.

    How to fix this is less straightforward. As far as I know, this exception will only be thrown when binding the CSV data to a bean, and only if fields are marked as required = true. Given that the source data does not always contain data for the last field, it seems like it should not be marked as required.

    If manipulating the source data really is your only option, I can think of a few possible approaches:

    1. Write the modified data back to a temporary file and then read that file with the CSV parser.
    2. If the CSV file is small enough to fit into memory, you could write the modified data to a StringWriter, and then construct a StringReader with the result, and parse that.
    3. Do the file content rewriting and CSV parsing in separate threads, using PipedOutputStream and PipedInputStream to connect them.
    4. Write a custom implementation of FilterReader that transforms the file contents as they are read (not the most straightforward to implement).

    Details of implementing these approaches would too long and broad for this answer, so I would suggest creating follow up questions if needed.

    There might be additional options specific to the OpenCSV library that I'm not aware of.