Consider I have a CSV file with record as follows:
firstname,lastname
"""amogh""",Kelula
here, in first row the first name should be enclosed in double quotes so after parsing in java the record should look like "amogh", Kelula
and hence it's escaped with two double quotes as quote and escape both are double quotes.
When I try to parse this CSV file using Apache Commons CSV, I configured CSVFormat
as below:
CSVFormat.DEFAULT.builder()
.setDelimiter(delimiter)
.setQuote(quoteCharacter) // quoteCharacter="
.setEscape(escapeCharacter) // escapeCharacter="
.setSkipHeaderRecord(false)
.setAllowMissingColumnNames(true)
.setNullString("")
.build();
This gives error when reading the row which has """
in the data. The exception is:
java.io.IOException: (startline 2) EOF reached before encapsulated token finished.
I do not understand why Apache Commons CSV fails to parse this file. Other parsers like PapaParser successfully parse this file without any error. What I am doing wrong here?
The default CSVFormat (CSVFormat.DEFAULT) should work for triple quoted values.
for (CSVRecord record : CSVFormat.DEFAULT.parse(reader)) {
System.out.println(record.toList().toString());
}
The above code prints out the correct values with the reader being a reader for a CSV file of the contents you provided:
firstname,lastname
"""amogh""",Kelula
I have deleted my old answer for clarity.