Using df1 and df2 below, I would like to create df_new that has feat0, feat1, feat2 from df1 and their corresponding values from df2 as value_feat0, value_feat1, value_feat2. I have also created df_new as expected output.
df1 = pd.DataFrame([['Age', 'Education-Num', 'Sex'],
['Sex','Education-Num','Age'],
['Education-Num','Age','Sex'],
['Age', 'Education-Num', 'Sex'],
['Age', 'Education-Num', 'Sex']],
columns=['feat0', 'feat1', 'feat2'])
df2 = pd.DataFrame([[39.0, 7, 13.0, 4, 1, 0, 4, 1, 2174.0, 0.0, 40.0, 39],
[50.0, 6, 13.0, 2, 4, 4, 4, 1, 0.0, 0.0, 13.0, 39],
[38.0, 4, 9.0, 0, 6, 0, 4, 1, 0.0, 0.0, 40.0, 39],
[53.0, 4, 7.0, 2, 6, 4, 2, 1, 0.0, 0.0, 40.0, 39],
[28.0, 4, 13.0, 2, 10, 5, 2, 0, 0.0, 0.0, 40.0, 5]],
columns=['Age', 'Workclass', 'EducationNum', 'MaritalStatus', 'Occupation',
'Relationship', 'Race', 'Sex', 'CapitalGain', 'CapitalLoss',
'Hoursperweek', 'Country'])
Expected output:
df_new = pd.DataFrame([['Age', 39, 'EducationNum', 13, 'Sex', 1],
['Sex',1, 'EducationNum',13, 'Age', 50],
['EducationNum',9, 'Age',38, 'Sex', 1],
['Age', 38, 'EducationNum', 7, 'Sex', 1],
['Age', 28, 'EducationNum', 13, 'Sex', 0]],
columns=['feat0', 'value_feat0', 'feat1', 'value_feat1', 'feat2', 'value_feat2'])
I tried using melt() and merge the dataframes, but could not succeed.
You can use a for
loop on the columns of df1
, and then use this proposed method to lookup for your values in df2
:
df_new = pd.DataFrame()
for col in df1.columns:
df_new[col] = df1[col]
idx, cols = pd.factorize(df1[col])
df_new['value_'+col] = df2.reindex(cols, axis=1).to_numpy()[np.arange(len(df2)), idx]
Output:
feat0 value_feat0 feat1 value_feat1 feat2 value_feat2
0 Age 39.0 Education-Num 13.0 Sex 1.0
1 Sex 1.0 Education-Num 13.0 Age 50.0
2 Education-Num 9.0 Age 38.0 Sex 1.0
3 Age 53.0 Education-Num 7.0 Sex 1.0
4 Age 28.0 Education-Num 13.0 Sex 0.0
Alternatively this could be done with the deprecated lookup
method (not recommended):
df_new = pd.DataFrame()
for col in df1.columns:
df_new[col] = df1[col]
df_new['value_'+col] = df2.lookup(df1[col].index, df1[col].values)
Note: I used the column name Education-Num
in df2
to be consistent with df1
.