Search code examples
javacsvparsingquote

How to parse CSV file with irregular use of quotes?


I have to parse a comma-seperated CSV file which includes columns which have an irregular use of double-quotes. The file entries look like this:

"1920,The False Road,American,Fred Niblo,""Enid Bennett, Lloyd Hughes""
"1920,813,American,""Charles Christie, Scott Sidney"",""Wedgwood Nowell, Ralph Lewis, Wallace Beery, Laura La Plante"",mystery

-

+---+------------+-----------------------------------+----+
|   |     A      |      B                            | C  |
+---+------------+-----------------------------------+----+
| 1 | 1920       | Fred Niblo                        | ...|
| 2 | 1920       | ""Charles Christie, Scott Sidney""| ...|
+---+------------+-----------------------------------+----+

As you can see, column 4 for the first entry is without and for the second entry with quotes.

Is there a way to consider this irregular use?


Solution

  • Your csv should actually look like this to be correct:

    1920,The False Road,American,Fred Niblo,"Enid Bennett, Lloyd Hughes",
    1920,813,American,"Charles Christie, Scott Sidney","Wedgwood Nowell, Ralph Lewis, Wallace Beery, Laura La Plante",mystery
    

    (also note the extra comma at the end of the first line)

    Here, the fields containing a comma are enclosed with ", and you can read that correctly with any csv parser (or library).

    But somehow it looks like your csv was converted to a one field csv. The whole line is enclosed with quotes, and the existing quotes are escaped with another one (as expected) -- except there is a closing quote missing at the end of each line..


    To solve this you could first add a quote at the end of each line, save the file and then parse it as csv which will return one cell for each row (containing all the data).

    You could then write the contents of each of those cells to another file, and then parse that file as csv again which should give you the correct data.