Say I have two data frames x_2016
and y_2017
with the columns index
, 0
, 1
, %
, and date
. I am interested in the columns index
and %
. The problem I'm running into is that I need them to be one dataframe where the index is W11 through W15 irrespective of the year for the values in the column %
, but since the indices don't overlap completely in terms of weeks, I will have some rows that need to be filled in with 0.
x_2016
index 0 1 % date
2016 W12 16.0 0 2.5 2016-03-28
2016 W13 38.0 5.0 43.0 2016-04-04
2016 W14 48.0 13.0 63.0 2016-04-11
2016 W15 50.0 18.0 1.0 2016-04-18
y_2017
index 0 1 % date
2017 W11 16.0 8.0 40.0 2017-03-13
2017 W12 20.0 16.0 19.0 2017-03-27
2017 W13 34.0 27.0 6.0 2017-04-03
2017 W14 28.0 32.0 17.0 2017-04-10
Ultimately the dataframe should look like this:
final
index %_2016 %_2017
W11 0 40.0
W12 2.5 19.0
W13 43.0 6.0
W14 63.0 17.0
W15 1.0 0
What's the most elegant way to do this in python?
You need to extract the "Wxx" information from each data frame to a new column and then merge the data frames on that column. Lastly, select just the columns of interest and sort by the W values.
x_2016['W_index'] = x_2016['index'].str.extract('(W\d\d)', expand=True)
y_2017['W_index'] = y_2017['index'].str.extract('(W\d\d)', expand=True)
pd.merge(
left=x_2016,
right=y_2017,
how='outer',
on='W_index',
suffixes=('_2016', '_2017'))[
['W_index', '%_2016', '%_2017']
].fillna(0).sort_values('W_index').reset_index(drop=True)
# returns:
W_index %_2016 %_2017
0 W11 0.0 40.0
1 W12 2.5 19.0
2 W13 43.0 6.0
3 W14 63.0 17.0
4 W15 1.0 0.0