python pandas datetime dataframe data-cleaning

Most elegant way to join two unequal data frames in python and fill-in non-existent values with 0

Say I have two data frames x_2016 and y_2017 with the columns index, 0, 1, %, and date. I am interested in the columns index and %. The problem I'm running into is that I need them to be one dataframe where the index is W11 through W15 irrespective of the year for the values in the column %, but since the indices don't overlap completely in terms of weeks, I will have some rows that need to be filled in with 0.

x_2016


    index       0       1           %           date
    2016 W12    16.0    0           2.5         2016-03-28
    2016 W13    38.0    5.0         43.0        2016-04-04
    2016 W14    48.0    13.0        63.0        2016-04-11
    2016 W15    50.0    18.0        1.0         2016-04-18

y_2017

   index         0      1           %           date
   2017 W11     16.0    8.0         40.0        2017-03-13
   2017 W12     20.0    16.0        19.0        2017-03-27
   2017 W13     34.0    27.0        6.0         2017-04-03
   2017 W14     28.0    32.0        17.0        2017-04-10

Ultimately the dataframe should look like this:

final

index     %_2016          %_2017
W11       0               40.0
W12       2.5             19.0
W13       43.0            6.0
W14       63.0            17.0       
W15       1.0             0

What's the most elegant way to do this in python?

Solution

You need to extract the "Wxx" information from each data frame to a new column and then merge the data frames on that column. Lastly, select just the columns of interest and sort by the W values.

x_2016['W_index'] = x_2016['index'].str.extract('(W\d\d)', expand=True)

y_2017['W_index'] = y_2017['index'].str.extract('(W\d\d)', expand=True)

pd.merge(
    left=x_2016, 
    right=y_2017, 
    how='outer', 
    on='W_index', 
    suffixes=('_2016', '_2017'))[
        ['W_index', '%_2016', '%_2017']
    ].fillna(0).sort_values('W_index').reset_index(drop=True)

# returns:
   W_index  %_2016  %_2017
0      W11     0.0    40.0
1      W12     2.5    19.0
2      W13    43.0     6.0
3      W14    63.0    17.0
4      W15     1.0     0.0