I'm confronted to this weird csv formatting, containing non escaped ,
character :
641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"
I need to return a list like this
[641,'Harstad/Narvik Airport Evenes', 'Harstad/Narvik', 'Norway', 'EVE', 'ENEV', 68.491302490234,16.678100585938,84,1, 'E', 'Europe/Oslo', 'airport', 'OurAirports']
I have two regex to match part of the string :
(\d+\.?\d*)
match numbers(["'])(?:(?=(\\?))\2.)*?\1
match any characters between two single or double quoteIs there a way to merge the matching into one result ?
You may use this regex:
>>> s = '641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"'
>>> csvData = re.findall(r'"[^"\\]*(?:\\.[^"\\]*)*"|\d+(?:\.\d+)?', s)
>>> print csvData
['641', '"Harstad/Narvik Airport, Evenes"', '"Harstad/Narvik"', '"Norway"', '"EVE"', '"ENEV"', '68.491302490234', '16.678100585938', '84', '1', '"E"', '"Europe/Oslo"', '"airport"', '"OurAirports"']
RegEx Details:
"[^"\\]*(?:\\.[^"\\]*)*"
: Match a quoted string that allows escaped quotes or any other escaped character inside e.g. "foo\"bar"
into a single element|
: OR\d+(?:\.\d+)?
: Match an integer or a decimal number