I have two datasets as follows:
a:
have lunch | have dinner |
---|---|
set a | set 1 |
set b | set 2 |
... | ... |
b:
variable | variable_value | value |
---|---|---|
have lunch | set a | 0.2 |
have lunch | set b | 0.5 |
have dinner | set 1 | 1.5 |
have dinner | set 2 | 1.7 |
... | ... | ... |
My expected outcome:
c:
have lunch | have lunch value | have dinner | have dinner value |
---|---|---|---|
set a | 0.2 | set 1 | 1.5 |
set b | 0.5 | set 2 | 1.7 |
... | ... | ... | ... |
Is there anyway to merge these two datasets in python/r?
I can think of one solution in R:
library(tidyverse)
library(magrittr)
b %<>% pivot_wider(names_from = variable, values_from = value)
left_join(a,b)
You can melt
, merge
, then pivot
:
out = (a
.melt(ignore_index=False, value_name='variable_value')
.reset_index()
.merge(b, how='left')
.pivot(index='index', columns='variable')
)
Output:
variable_value value
variable have dinner have lunch have dinner have lunch
index
0 set 1 set a 1.5 0.2
1 set 2 set b 1.7 0.5
For a formatting closer to what you have, further reprocess the column names:
out = (a.melt(ignore_index=False, value_name='variable_value')
.reset_index()
.merge(b, how='left')
.pivot(index='index', columns='variable')
.sort_index(level=1, axis=1)
.rename_axis(index=None)
)
out.columns = out.columns.map(lambda x: f'{x[1]}{" value" if x[0] == "variable_value" else ""}')
Output:
have dinner have dinner value have lunch have lunch value
0 1.5 set 1 0.2 set a
1 1.7 set 2 0.5 set b
Or, using a loop and concat
:
out = pd.concat([a[[col]].merge(b.loc[b['variable'].eq(col),
['variable_value', 'value']]
.rename(columns={'variable_value': col,
'value': f'{col} value'
}),
how='left'
)
for col in a], axis=1)
Output:
have lunch have lunch value have dinner have dinner value
0 set a 0.2 set 1 1.5
1 set b 0.5 set 2 1.7