Search code examples
pythonreplacesplitlisp

how to split & replace strings except in quotes, including the quotes in python


I making a lisp dialect and I have a function compute, which turns for example (+ 1 2 (- 2 1)) to ('+', '1', '2', ('-', '2', '1')).

Here is my current implementation.

import shlex

def compute(expr):
    expr = expr.replace('(', '( ').replace(')', ' )')
    expr = shlex.split(expr)

    new = ''
    for index, i in enumerate(expr):
        if   i == '(': new += i
        elif i == ')': new += i
        elif (not(len(expr) == 1 or len(expr) == 0)) and expr[index+1] == ')':
            new += ('"' + i + '"')
        else:
            new += ('"' + i + '",')
    
    expr = eval(new)
    return expr

This works for all expressions, except the ones using strings.

  1. ( and ) will turn to ( and ) even in strings
  2. As you can see. I'm using shlex to not split spaces inside strings, but I want it so that the quotes will be included. So (println "Hello, world!") shall turn to ('println', '"Hello, world!"') and not ('println', 'Hello, world!').
  3. \" completely breaks everything.

Solution

  • You have to use the poxis=False flag in the shlex.split() call, otherwise it get rid of the double quotes:

    shlex.split(expr)
    ['(', 'println', 'Hello, world!', ')']
    

    Where as:

    shlex.split(expr, posix=False)
    ['(', 'println', '"Hello, world!"', ')']
    

    However, this doesn't solve the issue entirely. You're going to have to modify your code that appends " to i to this:

    new = ''
    for index, i in enumerate(expr):
        if   i == '(': new += i
        elif i == ')': new += i
        elif (not(len(expr) == 1 or len(expr) == 0)) and expr[index+1] == ')':
            if i[0] == "\"" and i[-1] == "\"": # Check if i starts/ends with "
                new += ('\'' + i + '\'') # Append single quote
            else:
                new += ('"' + i + '"') # Append double quote
        else:
            new += ('"' + i + '",')
    

    Now with all the changes in place, print(compute('(println "Hello, world!")')) returns:

    ('println', '"Hello, world!"')
    

    Note: compute("(+ 1 2 (- 2 1))") still works as intended.