I am trying to concatenate the columns dynamically. The column names are stored as delimited strings in column 'merge':
import pandas as pd
df = pd.DataFrame({'prodName': ['p1', 'p2', 'p3'],
'mfr': ['m1', 'm2', 'm3'],
'segment': ['s1','s2','s3'],
'brand': ['b1','b2','b3'],
'merge': ['mfr!segment','mfr!brand','segment!brand']})
df
df
output:
prodName mfr segment brand merge
0 p1 m1 s1 b1 mfr!segment
1 p2 m2 s2 b2 mfr!brand
2 p3 m3 s3 b3 segment!brand
df
expected output:
prodName mfr segment brand merge new_col
0 p1 m1 s1 b1 mfr!segment m1_s1
1 p2 m2 s2 b2 mfr!brand m2_b2
2 p3 m3 s3 b3 segment!brand s3_b3
I have tried using pd.factorise
but it expects 1D data.
I have achieved desired output by iterating over rows
for index, row in df:
colstomerge=[col for col in row["merge"].lower().split("!")]
df['new_col']=df.loc[:,colstomerge].astype(str).sum(axis=1)
I am looking for a more elegant and optimal solution, Thanks.
You can do the following - assign a temporary column with list of columns to be joined and merge using .join. It should work for any number of columns separated by '!' in the 'merge' column
df["new_col"] = df.assign(cols=df["merge"].str.split("\!")).apply(
lambda row: "_".join(row[row["cols"]].fillna("")), axis=1
)
print(df)
df
output:
prodName mfr segment brand merge new_col
0 p1 m1 s1 b1 mfr!segment m1_s1
1 p2 m2 s2 b2 mfr!brand m2_b2
2 p3 m3 s3 b3 segment!brand s3_b3