Search code examples
language-agnosticparsingcsv

Split a string ignoring quoted sections


Given a string like this:

a,"string, with",various,"values, and some",quoted

What is a good algorithm to split this based on commas while ignoring the commas inside the quoted sections?

The output should be an array:

[ "a", "string, with", "various", "values, and some", "quoted" ]


Solution

  • If my language of choice didn't offer a way to do this without thinking then I would initially consider two options as the easy way out:

    1. Pre-parse and replace the commas within the string with another control character then split them, followed by a post-parse on the array to replace the control character used previously with the commas.

    2. Alternatively split them on the commas then post-parse the resulting array into another array checking for leading quotes on each array entry and concatenating the entries until I reached a terminating quote.

    These are hacks however, and if this is a pure 'mental' exercise then I suspect they will prove unhelpful. If this is a real world problem then it would help to know the language so that we could offer some specific advice.