Search code examples
pythonpandasdataframepython-re

Python remove text if same of another column


I want to drop in my dataframe the text in a column if it starts with the same text that is in another column. Example of dataframe:

name        var1
John Smith  John Smith Hello world
Mary Jane   Mary Jane Python is cool
James Bond  My name is James Bond
Peter Pan   Nothing happens here

Dataframe that I want:

name        var1
John Smith  Hello world
Mary Jane   Python is cool
James Bond  My name is James Bond
Peter Pan   Nothing happens here

Something simple as:

df[~df.var1.str.contains(df.var1)]

does not work. How I should write my python code?


Solution

  • Try using apply lambda;

    df["var1"] = df.apply(lambda x: x["var1"][len(x["name"]):].strip() if x["name"] == x["var1"][:len(x["name"])] else x["var1"],axis=1)