Search code examples
pandassplitstring-concatenation

how to concatenate multiple columns dynamically where column names are stored as delimited string Pandas


I am trying to concatenate the columns dynamically. The column names are stored as delimited strings in column 'merge':

 import pandas as pd

 df = pd.DataFrame({'prodName': ['p1', 'p2', 'p3'],
               'mfr': ['m1', 'm2', 'm3'],
               'segment': ['s1','s2','s3'],
       'brand': ['b1','b2','b3'],
       'merge': ['mfr!segment','mfr!brand','segment!brand']})
 df

df output:

     prodName   mfr  segment   brand           merge
 0        p1     m1       s1      b1     mfr!segment  
 1        p2     m2       s2      b2       mfr!brand
 2        p3     m3       s3      b3   segment!brand

df expected output:

     prodName   mfr  segment   brand           merge   new_col
 0        p1     m1       s1      b1     mfr!segment      m1_s1    
 1        p2     m2       s2      b2       mfr!brand      m2_b2
 2        p3     m3       s3      b3   segment!brand      s3_b3

I have tried using pd.factorise but it expects 1D data. I have achieved desired output by iterating over rows

 for index, row in df:
     colstomerge=[col for col in row["merge"].lower().split("!")]
     df['new_col']=df.loc[:,colstomerge].astype(str).sum(axis=1)

I am looking for a more elegant and optimal solution, Thanks.


Solution

  • You can do the following - assign a temporary column with list of columns to be joined and merge using .join. It should work for any number of columns separated by '!' in the 'merge' column

    df["new_col"] = df.assign(cols=df["merge"].str.split("\!")).apply(
        lambda row: "_".join(row[row["cols"]].fillna("")), axis=1
    )
    
    print(df)
    

    df output:

      prodName mfr segment brand          merge new_col
    0       p1  m1      s1    b1    mfr!segment   m1_s1
    1       p2  m2      s2    b2      mfr!brand   m2_b2
    2       p3  m3      s3    b3  segment!brand   s3_b3