I am trying to parse a long string of 'objects' enclosed by quotes delimitated by commas. EX:
s='"12345","X","description of x","X,Y",,,"345355"'
output=['"12345"','"X"','"description of x"','"X,Y"','','','"345355"']
I am using split to delimitate by commas:
s=["12345","X","description of x","X,Y",,,"345355"]
s.split(',')
This almost works but the output for the string segment ...,"X,Y",... ends up parsing the data enclosed by quotes to "X and Y". I need the split to ignore commas inside of quotes
Is there a way I can delaminate by commas except for in quotes?
I tried using a regex but it ignores the ...,,,... in data because there are no quotes for blank data in the file I'm parsing. I am not an expert with regex and this sample I used from Python split string on quotes. I do understand what this example is doing and not sure how I could modify it to allow parse data that is not enclosed by quotes.
Thanks!
this should work:
In [1]: import re
In [2]: s = '"12345","X","description of x","X,Y",,,"345355"'
In [3]: pattern = r"(?<=[\",]),(?=[\",])"
In [4]: re.split(pattern, s)
Out[4]: ['"12345"', '"X"', '"description of x"', '"X,Y"', '', '', '"345355"']
(?<=...)
is a "positive lookbehind assertion". It causes your pattern (in this case, just a comma, ",
") to match commas in the string only if they are preceded by the pattern given by ...
. Here, ...
is [\",]
, which means "either a quotation mark or a comma".(?=...)
is a "positive lookahead assertion". It causes your pattern to match commas in the string only if they are followed by the pattern specified as ...
(again, [\",]
: either a quotation mark or a comma).