Search code examples
csvopencsv

Is CSV data with missing leading quotations considered malformed?


I am using OpenCSV to read CSV files. Looking over the docs, I don't see guidelines on how to handle malformed data.

I have a CSV File. Comes with all the expected features: each field is separated by a comma, and each field is surrounded by quotes in case one of the values may contain a comma. However, every line (except the headers) is missing a leading quote. Here is an example

"Header 1","Header2"
value1","value2"
value1","value2"

The CSV parser ended up skipping every other line due to the way the quotes were lined up, which obviously causes problems.

I would consider this to be an error, because the first column is missing quotation marks since I know what the data should look like, but as far as the CSV spec is considered, this may be considered valid? If so, I suppose I would have to build extra checks myself to make sure that I am not missing any lines, despite it containing valid CSV data.


Solution

  • According of the rfc for CSV files:

    While there are various specifications and implementations for the CSV format, there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files.

    So simply put, malformed? No. Informal? No. Even this article (Linked in the RFC) mentions that lines can be mixmatched with quotes and no quotes.