Search code examples
pythonpandasdataframemulti-index

Create Multi-Index empty DataFrame to join with main DataFrame


Suppose that I have a dataframe which can be created using code below

df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-05'],
                          'product':['A', 'A', 'A', 'B', 'B', 'B'],
                          'price':[10, 20, 30, 40, 50, 60]
                         }
                 )
df['date'] = pd.to_datetime(df['date'])

I want to create an empty dataframe let's say main_df which will contain all dates between df.date.min() and df.date.max() for each product and on days where values in nan I want to ffill and bfill for remaning. The resulting dataframe would be as below:

+------------+---------+-------+
|    date    | product | price |
+------------+---------+-------+
| 2021-01-01 | A       |    10 |
| 2021-01-02 | A       |    20 |
| 2021-01-03 | A       |    20 |
| 2021-01-04 | A       |    20 |
| 2021-01-05 | A       |    30 |
| 2021-01-01 | B       |    40 |
| 2021-01-02 | B       |    40 |
| 2021-01-03 | B       |    50 |
| 2021-01-04 | B       |    50 |
| 2021-01-05 | B       |    60 |
+------------+---------+-------+

Solution

  • First

    make pivot table, upsampling by asfreq and fill null

    df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
    

    output:

    product     A       B
    date        
    2021-01-01  10.0    40.0
    2021-01-02  20.0    40.0
    2021-01-03  20.0    50.0
    2021-01-04  20.0    50.0
    2021-01-05  30.0    60.0
    



    Second

    stack result and so on (include full code)

    (df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
     .stack().reset_index().rename(columns={0:'price'})
     .sort_values('product').reset_index(drop=True))
    

    output:

        date        product price
    0   2021-01-01  A       10.0
    1   2021-01-02  A       20.0
    2   2021-01-03  A       20.0
    3   2021-01-04  A       20.0
    4   2021-01-05  A       30.0
    5   2021-01-01  B       40.0
    6   2021-01-02  B       40.0
    7   2021-01-03  B       50.0
    8   2021-01-04  B       50.0
    9   2021-01-05  B       60.0