Search code examples
groovyformattingopencsv

Remove quotes from csv file using opencsv


I am trying to add changes data in a csv file:

This is the sample data:

DATE        status       code                       value     value2
"2016-01-26","Subscription All","119432660","1315529431362550","0.0080099833517888"
"2016-01-26","Subscription All","119432664","5836995058433524","0.033825584764444"
"2016-01-26","Subscription All","119432664","8287300074499777","0.076913377834744"
"2016-01-26","Subscription All","119432664","14870697739968326","0.0074188355187426"

My code used to format the data:

    CSVReader reader = new CSVReader(new FileReader(new File(fileToChange)), CSVParser.DEFAULT_SEPARATOR, CSVParser.NULL_CHARACTER, CSVParser.NULL_CHARACTER, 1)
    info "Read all rows at once"
    List<String[]> allRows = reader.readAll();

    CSVWriter writer = new CSVWriter(new FileWriter(fileToChange), CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER)
    writer.writeAll(allRows)
    writer.close()

The output i get is this, with extra quote added instead of removing it.

""2016-01-26"",""Subscription All"",""119432660"",""1315529431362550"",""0.0080099833517888""
""2016-01-26"",""Subscription All"",""119432664"",""5836995058433524"",""0.033825584764444""
""2016-01-26"",""Subscription All"",""119432664"",""8287300074499777"",""0.076913377834744""
""2016-01-26"",""Subscription All"",""119432664"",""14870697739968326"",""0.0074188355187426""

I want to remove the quotes. Please can someone help.

Also, is it possible to change the date format to yyyymmdd instead of yyyy-mm-dd?

   allRows.each { String[] theLine ->
        String newDate = theLine[0].replaceAll('-', '')
        String newline = theLine.eachWithIndex { String s, int i -> return i > 0 ? s : newDate}
        writer.writeLine(newline)
    }

Thanks


Solution

  • When you instantiated your CSVReader you told it to treat no characters as quotes, therefore it read the existing quotes as data and did not remove them.

    When you told CSVWriter not to add any quotes it honored your request. However, the input data contained quote characters, and the convention for including quotes inside a string in CSV is to double the quotes. Thus the string value

    ABC"DEF
    

    gets coded in CSV as

    "ABC""DEF"
    

    So the result you see is the combination of not removing the quotes on input (you told it not to) and then doubling the quotes on output.

    To solve this change the input option from NULL_CHARACTER to DEFAULT_QUOTE_CHARACTER. However be aware that if any of your data actually contains embedded quotes or commas the resulting output will not be valid CSV.

    Also I think this might be a valid bug report against OpenCSV. I believe that OpenCSV needs to inform you if it is about to generate invalid CSV when you told it to omit quotes, probably via a runtime exception. Although I suppose they might argue that you chose to work without a net and should accept whatever you get. Personally I go for the "principle of least surprise", which IMHO would be not to double quotes when the output is unquoted.