I have a few lines of python code that go through a list and removes punctuation from each row. Here the code runs
import pandas as pd
import re
data = [['M.B.B.S'], ['M.B.B.S,B.S'],['ACN-P, D.N.P'],['ACNP-BC, DNP']]
df = pd.DataFrame(data, columns = ['ID'])
p = re.compile(r'[^\w\s\d]+')
df['ID'] = [p.sub('',x) for x in df['ID'].tolist()]
df
The problem I am facing is that I need the periods, and dashes (".", "-") to be substituted for no space as they do above, yet the commas (",") to be substituted for spaces. I can't get the correct expression syntax. For example line 2 gives the result "MBBSBS" when I need it to read "MBBS BS"
Just do the alternate replacement before the regex:
df['ID'] = [p.sub('',x.replace(',',' ')) for x in df['ID'].tolist()]
Or, just use the Python string method .translate and skip the regex entirely:
import pandas as pd
import string
repl={ord(k):'' for k in string.punctuation}
repl[ord(',')]=' '
data = [['M.B.B.S'], ['M.B.B.S,B.S'],['ACN-P, D.N.P'],['ACNP-BC, DNP']]
df = pd.DataFrame(data, columns = ['ID'])
df['ID'] = [x.translate(repl) for x in df['ID'].tolist()]
>>> df
ID
0 MBBS
1 MBBS BS
2 ACNP DNP
3 ACNPBC DNP
And if you don't want ', '
becoming two spaces, just replace those prior to other replacements:
df['ID'] = [x.replace(', ',' ').translate(repl) for x in df['ID'].tolist()]
You get the idea...