Search code examples
javaopencsv

CSVParser not handling escaped delimiters in unquoted strings


I'm using com.opencsv.CSVParser (5.1) in my Java program.

                    final CSVParser csvParser =
                        new CSVParserBuilder()
                        .withSeparator(',')
                        .withQuoteChar('"')
                        .withEscapeChar('\\')
                        .withIgnoreQuotations(true)
                        .build();

My input file has

3,2.48,E #3,String with \, comma in it,0

I was expecting the 4th field to end up with "String with , comma in it". But instead, the parser is splitting the string into two fields at the escaped comma, with "String with " and " comma in it". The documentation for withEscapeChar() says:

Sets the character to use for escaping a separator or quote.

And since quoted separators don't need to be escaped, I assumed (hoped) this would allow me to escape separators in non-quoted strings. I've tried this both with and without withIgnoreQuotations.

Am I missing something, or doing something wrong?


Solution

  • I don't see anything wrong with your code - but I also am not able to parse your data as expected - I hit the same problem as you. This feels like a bug (which is surprising). And if it's not a bug, then the correct usage is too obscure for me.

    Alternatively, you can use Commons CSV:

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-csv</artifactId>
        <version>1.8</version>
    </dependency>
    

    Sample code:

    import com.opencsv.CSVReader;
    import com.opencsv.CSVWriter;
    
    ...
    
    private void commonsCsvTest() throws URISyntaxException, IOException {
        Path path = Paths.get(ClassLoader.getSystemResource("csv/escapes.csv").toURI());
        Reader in = new FileReader(path.toString());
        Iterable<CSVRecord> records = CSVFormat.DEFAULT.withEscape('\\').parse(in);
        for (CSVRecord record : records) {
            System.out.println(record.get(3));
        }
    }
    

    Using your data in the input file "escapes.csv", we get the following output:

    String with , comma in it
    

    You can obviously change how you read the input file, to fit your specific situation.