Search code examples
javacsvopencsvapache-commons-csv

Ignore double quote in the fields when parsing a CSV file using CSV parser


Sample Data -

Header1, full_name, header3, header4

  1. 20, "bob, XXX", "test", 30
  2. 20, "evan"s,YYY ", "test", 30
  3. 20, "Tom, ZZZ", "test", 30

    CSVReader csvReader = new CSVReader(reader, ',', '"');
    

The second row doesn't read as expected. since there is a double quote in the full_name column value.

I want to ignore such cases. any suggestion would be appreciated.

using openCSV java api for parsing.

Edit:

I am getting the data from database. one of the database column field has that one double quote in it's value. Because of that the csv data looks malformed.


Solution

  • univocity-parsers can handle unescaped quotes and is also 4x faster than opencsv. Try this code:

    public static void main(String... args){
        String input = "" +
                "20, \"bob, XXX\", \"test\", 30\n" +
                "20, \"evan\"s,YYY \", \"test\", 30\n" +
                "20, \"Tom, ZZZ\", \"test\", 30 ";
    
    
        CsvParserSettings settings = new CsvParserSettings();
    
        CsvParser parser = new CsvParser(settings);
        List<String[]> rows = parser.parseAll(new StringReader(input));
    
        //printing values enclosed in [ ]  to make sure you are getting the expected result
        for(String[] row : rows){
            for(String value : row){
                System.out.print("[" + value + "],");
    
            }
            System.out.println();
        }
    }
    

    This will produce:

    [20],[bob, XXX],[test],[30],
    [20],["evan"s],[YYY "],[test],[30],
    [20],[Tom, ZZZ],[test],[30],
    

    Additionally, you can control how to handle unescaped quotes with one of:

    settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER);
    settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
    settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.RAISE_ERROR);
    settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.SKIP_VALUE);
    

    When reading large files, you can use a RowProcessor or iterate over each row like this:

    parser.beginParsing(new File("/path/to/your.csv"));
    
    String[] row;
    while ((row = parser.parseNext()) != null) {
        // process row
    }
    

    Disclaimer: I'm the author of this libary. It's open source and free (Apache 2.0 license)