Search code examples
javacsvapache-commons-csv

Apache commons CSV gives error when quote and escape character is same


Consider I have a CSV file with record as follows:


firstname,lastname
"""amogh""",Kelula

here, in first row the first name should be enclosed in double quotes so after parsing in java the record should look like "amogh", Kelula and hence it's escaped with two double quotes as quote and escape both are double quotes.

When I try to parse this CSV file using Apache Commons CSV, I configured CSVFormat as below:

CSVFormat.DEFAULT.builder()
.setDelimiter(delimiter)
.setQuote(quoteCharacter) // quoteCharacter="
.setEscape(escapeCharacter) // escapeCharacter="
.setSkipHeaderRecord(false)
.setAllowMissingColumnNames(true)
.setNullString("")
.build();

This gives error when reading the row which has """ in the data. The exception is:

java.io.IOException: (startline 2) EOF reached before encapsulated token finished.

I do not understand why Apache Commons CSV fails to parse this file. Other parsers like PapaParser successfully parse this file without any error. What I am doing wrong here?


Solution

  • The default CSVFormat (CSVFormat.DEFAULT) should work for triple quoted values.

    for (CSVRecord record : CSVFormat.DEFAULT.parse(reader)) {
        System.out.println(record.toList().toString());
    }
    

    The above code prints out the correct values with the reader being a reader for a CSV file of the contents you provided:

    firstname,lastname
    """amogh""",Kelula
    

    I have deleted my old answer for clarity.