I am getting lots of files, in which I have zero control, that I need to split based on delimiter. But I don't want to split when the delimiter is inside quotes. So, column1, column2, column3 is
column1
column2
column3
however column1, "column2," column3 is
column1
"column2," column3
This works using this RegEx (under C#)
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Now, my problem is when there is a line with only one double quote (opening, or closing only) For example column1, column2", column3 returns
column1
column3
while it should return
column1
column2"
column3
I have found lots of RegEx related, but all of them fail in the above particular example.
You can match all the fields you need using
Regex.Matches(text, "(?:\"[^\"]*\"|[^,])+|(?<![^,])(?![^,])")
See the regex demo. Details:
(?:\"[^\"]*\"|[^,])+
- one or more occurrences of
"[^"]*"
- a "
, zero or more chars other than "
and then a "
(if there can be ""
inside, replace with "[^"]*(?:""[^"]*)*"
)|
- or[^,]
- any char but ,
|
- or(?<![^,])(?![^,])
- a location that is either at the start of string or is immediately preceded with a comma, and is either at the end, or immediately followed with a comma.