Search code examples
pythonpandasdataframenlp

Pandas duplicate rows replacing one column value


I have a dataframe train_df that looks like this (this is an example, I have a lot more rows):

term               text_snippet        abbr   label
Operatiekamer      De OK is open       OK      1

I have another dataframe abbr_df that looks like this:

abbr    term
OK      Operatiekamer
OK      Operatiekledij

What I want to achieve is that my train_df is supplemented with the same text snippet and abbr as above but with the wrong term and label 0. Such as:

term               text_snippet        abbr   label
Operatiekamer      De OK is open       OK      1
Operatiekledij     De OK is open       OK      0

I feel like there is a sophisticated method to achieve this but I just can't get it to work. Can anyone help me out?


Solution

  • a similar approach as the previous answer in terms of merging but with a different handling of the columns:

    import pandas as pd
    train_df=pd.DataFrame({"term":['Operatiekamer'], "text_snippet":['De OK is open'], "abbr":['OK'],"label":[1]})             
    abbr_df=pd.DataFrame({"abbr":['OK','OK'],"term":['Operatiekamer','Operatiekledij']})
    
    train_df=abbr_df.merge(train_df,on=['abbr'],how='inner',suffixes=[None,"_train"])
    
    train_df.loc[train_df.term != train_df.term_train,"label"] = 0
    train_df.drop(columns=["term_train"],inplace=True)
    train_df