Search code examples
pythonregexlistpython-re

Adding special character to a column names


I have a list of column names that are in string format like below:

lst = ["plug", "plug+wallet", "wallet-phone"]

I want to add df[] along with " ' ". I am using regex to substitute it. But the regex which I am using works fine when the list is like this:-

lst = [" 'plug'", "'plug'+'wallet'", "'wallet'-'phone'"]
x=[]
for l in lst: x.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',l))
print(x)

the result is as excepted

x: [" df['plug']", "df['plug']+df['wallet']", "df['wallet']-df['phone']"]

But when list is like this:

lst = ["plug", "plug+wallet", "wallet-phone"]
x=[]
y=[]
for l in lst: x.append(re.sub(r"('[^+\-*\/'\d]+')", r'\1',l))
for f in x:    y.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',f))
print(x)
print(y)

This gives:

['plug', 'plug+wallet', 'wallet-phone']
['plug', 'plug+wallet', 'wallet-phone']

Where am I going wrong? Am I missing anything in the first regex pattern or not passing the r'\1' properly?

Excepted Output:

x: [" 'plug'", "'plug'+'wallet'", "'wallet'-'phone'"]    
y: [" df['plug']", "df['plug']+df['wallet']", "df['wallet']-df['phone']"]

Solution

  • This works:

    import re
    lst = ["plug", "plug+wallet", "wallet-phone"]
    x = [re.sub(r"([^+\-*\/'\d]+)", r"'\1'", l) for l in lst]
    y = [re.sub(r"('[^+\-*\/'\d]+')", r"df[\1]", l) for l in x]
    print(x)
    print(y)
    

    Your first regular expression was wrongly matching on the '' and was then in the replace subject not enclosing it in ''.

    Tested under Python 3.8.0.