Search code examples
javacsvapache-commons-csv

Apache commons-csv error with quote


I'm working with org.apache.commons-csv 1.4, this week I discovered in one of our junit test's, this strange behaviuor:

    CSVReader reader = null;
    List<String[]> linesCsv = new ArrayList<>();
    FileInputStream fileStream = null;
    InputStreamReader inputStreamReader = null;

    try {
        fileStream = new FileInputStream(file);
        inputStreamReader = new InputStreamReader(fileStream, "ISO-8859-1");
        reader = new CSVReader(inputStreamReader, ',', '"', 0);

        String[] record = null;
        while ((record = reader.readNext()) != null) {
            linesCsv.add(record);
        }

    } catch (Exception e) {
        logger.error("Error in ", e);
    } finally {
        if (inputStreamReader != null) {
            inputStreamReader.close();
        }
        if (fileStream != null) {
            fileStream.close();
        }
        if (reader != null) {
            reader.close();
        }
    }

*ERROR CASE

Input .csv

DAR_123451                  ,"XXXXX Hello World "Hello World XXX "
DAR_123452                  ,"XXXXX Hello World "Hello World XXX "

Java KO:

[0.0] DAR_123451
[0.1] XXXXX Hello World "Hello World XXX\nDAR_123456 ,XXXXX Hello World "Hello World XXX


*CORRECT CASE

Input .csv

DAR_123451                  ,"XXXXX Hello World "Hello World" XXX "
DAR_123452                  ,"XXXXX Hello World "Hello World" XXX "

Java OK:

[0.0] DAR_123451 [0.1] XXXXX Hello World "Hello World" XXX

[1.0] DAR_123452 [1.1] XXXXX Hello World "Hello World" XXX

I can't setup commons csv library to work properly, it seems it's a Bug, how we can read correctly strings with single quotes in strings?


Solution

  • The CSV format usually use 2 consecutive double-quotes to include a double-quote in the text if the values are surrounded by quotes, e.g. the following works.

    When I use the latest version of commons-csv I even get an exception with the your inputs (IOException: (line 1) invalid char between encapsulated token and delimiter)

    So to correctly include the double-quotes you need to use the following

    DAR_123451                  ,"XXXXX Hello World ""Hello World"" XXX "
    DAR_123452                  ,"XXXXX Hello World ""Hello World"" XXX "
    

    And the test-case then works as expected:

        Reader in = new StringReader(
                "DAR_123451                  ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"\n" +
                        "DAR_123452                  ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"");
        Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
        for (CSVRecord record : records) {
            for (int i = 0; i < record.size(); i++) {
                System.out.println("At " + i + ": " + record.get(i));
            }
        }
    

    Output:

    At 0: DAR_123451                  
    At 1: XXXXX Hello World "Hello World XXX" 
    At 0: DAR_123452                  
    At 1: XXXXX Hello World "Hello World XXX" 
    

    See https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality for details.