Search code examples
pythonquotesstring-parsing

Parsing a string with nested quotes


I need to parse a string that looks like this:

"prefix 'field1', '', 'field2', 'field3', 'select ... where (column1 = '2017') and ((('literal1', 'literal2', 'literal3', 'literal4', 'literal5', 'literal6', 'literal7') OVERLAPS column2 Or ('literal8') OVERLAPS column3 And (column4 > 0.0 Or column6 > 0.0)) And column7 IN_COMMUNITY [int1] And column5 = 'literal9')  LIMIT 0 ', 'field5', 'field6', 'field7', 'field8', 'field9', '', 'field10'"

And I would like to get a list like the following:

['field1', '', 'field2', 'field3', 'select ... where (column1 = '2017') and ((('literal1', 'literal2', 'literal3', 'literal4', 'literal5', 'literal6', 'literal7') OVERLAPS column2 Or ('literal8') OVERLAPS column3 And" (column4 > 0.0 Or column6 > 0.0)) And column7 IN_COMMUNITY [int1] And column5 = 'literal9')  LIMIT 0 ', 'field5', 'field6', 'field7', 'field8', 'field9', '', 'field10']

I've tried it with regular expressions, but it doesn't work in the substring of the pseudo-SQL statement.

How can I get that list?


Solution

  • Here's a brittle way to do it if you know what the SQL string is supposed to look like.

    We match the SQL string, and split the rest into start and end strings.

    Then we match the simpler field pattern and build up a list from start for that pattern, add back in the SQL match, and then the fields from the end string.

    sqlmatch = 'select .* LIMIT 0'
    fieldmatch = "'(|\w+)'"
    match = re.search(sqlmatch, mystring)
    startstring = mystring[:match.start()]
    sql = mystring[match.start():match.end()]
    endstring = mystring[match.end():]
    result = []
    for found in re.findall(fieldmatch, startstring):
        result.append(found)
    
    result.append(sql)
    for found in re.findall(fieldmatch, endstring):
        result.append(found)
    

    Then the result list looks like the following:

    ['field1',
     '',
     'field2',
     'field3',
     'select ... where (column1 = \'2017\') and (((\'literal1\', \'literal2\', \'literal3\', \'literal4\', \'literal5\', \'literal6\', \'literal7\') OVERLAPS column2 Or (\'literal8\') 
    OVERLAPS column3 And" (column4 > 0.0 Or column6 > 0.0)) And column7 IN_COMMUNITY [int1] And column5 = \'literal9\')  LIMIT 0',
     'field5',
     'field6',
     'field7',
     'field8',
     'field9',
     '',
     'field10']