Search code examples
pythondataframereplacedata-processing

Python: how to replace the characters between fixed format of a column with another column in DataFrame?


for example, how to replace <Isis/> with twins in the first row in the whole table?

I try to use the following codes, but Python indicates:"TypeError: replace() argument 1 must be str, not None"

import pandas as pd 
import re

df = pd.read_csv('train.csv')

p = re.compile('<\w+/>')

df['original'] = df.apply(lambda x: x['original'].replace(
    p.match(x['original']), str(x['edit'])), axis = 1)

print(df.head())

I hope powerful friends help me, very anxious, thank you!

I expect the code can return the DataFrame format, and "France is ‘ hunting down its citizens who joined ’ without trial in Iraq" can be changed to "France is ‘ hunting down its citizens who joined twins ’ without trial in Iraq".


Solution

  • can you try:

    import re
    df['original'] = df.apply(lambda x: re.sub("<.*?>", x['edit'], x['original']),axis=1)