I'm trying to replace a lots of strings (only three strings example but I have thousands strings actually) to other strings defined on "replaceWord".
However,code i wrote dose not work as I expected.
After running script, output is as below:
before after
0 test1234 test1234
1 test1234 test1234
2 test1234 1349
3 test1234 test1234
4 test1234 test1234
I need output as below;
before after
1 test1234 1349
2 test9012 te1210st
3 test5678 8579
4 april I was born August
5 mcdonalds i like checkin
script
import os.path, time, re
import pandas as pd
import csv
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)
for word in replaceWord:
body01_after = re.sub(word[0], word[1], body01_before)
body02_after = re.sub(word[0], word[1], body02_before)
body03_after = re.sub(word[0], word[1], body03_before)
body04_after = re.sub(word[0], word[1], body04_before)
body05_after = re.sub(word[0], word[1], body05_before)
df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
Use regular expressions to capture the non-digits (\D+)
as the first group and the digits (\d+)
as the second group. replace the text by starting with the second group \2
then first group \1
df['after'] = df['before'].str.replace(r'(\D+)(\d+)', r'\2\1', regex = True)
df
before after
1 test1234 1234test
2 test9012 9012test
3 test5678 5678test
Seems that you do not have the dataset. You have variables:
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
# Gather the variables in a list
vars = re.findall('body0\\d[^,]+', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))
# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp
# Do the replacement
df['after'] = df['before'].str.replace('(\\w+)',repl, regex= True)
df
before_1 before after
0 body01_before test1234 1349
1 body02_before test9012 te1210st
2 body03_before test5678 8579
3 body04_before i like mcdonalds i like chicken
4 body05_before I was born april I was born August