Disclaimer: I'm not actually parsing a CSV, but a CSV-like format; I'm not interested in using a pre-built library.
What is the proper way to parse the following 2 lines?:
a,b"c"d,e
a,"bc"d,e
i.e., (a) with quotes in the middle of a value, and (b) with a quote at the start, but no end quote immediately before the next value.
I can't figure out how I should handle these cases (that would be the most intuitive).
My thoughts are that (a) should be parsed as a
,b"c"d
,e
(quotes left in), and (b) should be parsed the same way, a
,"bc"d
,e
. But then let me introduce a 3rd case, a,"b,c"d,e
-- do we split on that 2nd comma between "b" and "c" or not?
Here is how you would parse it if you want to be consistent with Excel:
input:
a,b"c"d,e
a,"bc"d,e
a,"b,c"d,e
parsed (in JSON):
[
["a", "b\"c\"d", "e"],
["a", "bcd", "e"],
["a","b,cd", "e"]
]
The parsing logic is:
Note that this means that if you have a space after a cell-delimiting comma, followed by a dbl-quote, you get a different result than if you have no space after the comma (followed by a dbl-quote)