Search code examples
pythonregexregex-group

Regex to match any number AND any characters between quotes


I'm confronted to this weird csv formatting, containing non escaped , character :

   641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"  

I need to return a list like this

[641,'Harstad/Narvik Airport Evenes', 'Harstad/Narvik', 'Norway', 'EVE', 'ENEV', 68.491302490234,16.678100585938,84,1, 'E', 'Europe/Oslo', 'airport', 'OurAirports']

I have two regex to match part of the string :

  • (\d+\.?\d*) match numbers
  • (["'])(?:(?=(\\?))\2.)*?\1 match any characters between two single or double quote

Is there a way to merge the matching into one result ?


Solution

  • You may use this regex:

    >>> s = '641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"'
    
    >>> csvData = re.findall(r'"[^"\\]*(?:\\.[^"\\]*)*"|\d+(?:\.\d+)?', s)
    >>> print csvData
    
    ['641', '"Harstad/Narvik Airport, Evenes"', '"Harstad/Narvik"', '"Norway"', '"EVE"', '"ENEV"', '68.491302490234', '16.678100585938', '84', '1', '"E"', '"Europe/Oslo"', '"airport"', '"OurAirports"']
    

    RegEx Details:

    • "[^"\\]*(?:\\.[^"\\]*)*": Match a quoted string that allows escaped quotes or any other escaped character inside e.g. "foo\"bar" into a single element
    • |: OR
    • \d+(?:\.\d+)?: Match an integer or a decimal number