Search code examples
pythonpandaspivot-tablemelt

How to create a single column of monthly values from rows with quarterly data with Pandas in Python?


My dataframe contains quarterly data and for some companies monthly data as well.

import pandas as pd
df = pd.DataFrame({'quarter': ['2010-1', '2010-2', '2010-3','2010-4', '2011-1'],
                  'volume_quarter': [450, 450, 450, 450, 450],
                  'volume_month_1': [150, 150, 150, 150, 150],
                  'volume_month_2': [160, 160, 160, 160, 160],
                  'volume_month_3': [140, 140, 140, 140, 140]})
df

Gives:

quarter volume_quarter  volume_month_1  volume_month_2  volume_month_3
2010-1  450               150            160               140
2010-2  450               150            160               140
2010-3  450               150            160               140
2010-4  450               150            160               140
2011-1  450               150            160               140

With the following code:

pd.melt(df, id_vars = ['quarter'], value_vars=['volume_month_1', "volume_month_2", "volume_month_3"])

I get:

    quarter variable    value
0   2010-1  volume_month_1  150
1   2010-2  volume_month_1  150
2   2010-3  volume_month_1  150
3   2010-4  volume_month_1  150
4   2011-1  volume_month_1  150
5   2010-1  volume_month_2  160
6   2010-2  volume_month_2  160
7   2010-3  volume_month_2  160
8   2010-4  volume_month_2  160
9   2011-1  volume_month_2  160
10  2010-1  volume_month_3  140
11  2010-2  volume_month_3  140
12  2010-3  volume_month_3  140
13  2010-4  volume_month_3  140
14  2011-1  volume_month_3  140

Instead I'm trying to achieve the following:


    quarter variable        value
0   2010-1  volume_month_1  150
1   2010-1  volume_month_2  160
2   2010-1  volume_month_3  140
3   2010-2  volume_month_1  150
4   2010-2  volume_month_2  160
5   2010-2  volume_month_3  140
6   2010-3  volume_month_1  150
7   2010-3  volume_month_2  160
8   2010-3  volume_month_3  140
9   2010-4  volume_month_1  150
10  2010-4  volume_month_2  160
11  2010-4  volume_month_3  140
12  2011-1  volume_month_1  150
13  2011-1  volume_month_2  160
14  2011-1  volume_month_3  140

I'd like to achieve this, so I can run the Arima model on the montly values.

Million thanks in advance !


Solution

  • You only missed sorting, this line of code:

    df = (
        pd.melt(
            df,
            id_vars=["quarter"],
            value_vars=["volume_month_1", "volume_month_2", "volume_month_3"],
        )
        .sort_values(by="quarter")
        .reset_index(drop=True)
    )
    

    returns as you desired:

       quarter        variable  value
    0   2010-1  volume_month_1    150
    1   2010-1  volume_month_2    160
    2   2010-1  volume_month_3    140
    3   2010-2  volume_month_1    150
    4   2010-2  volume_month_2    160
    5   2010-2  volume_month_3    140
    6   2010-3  volume_month_1    150
    7   2010-3  volume_month_2    160
    8   2010-3  volume_month_3    140
    9   2010-4  volume_month_1    150
    10  2010-4  volume_month_2    160
    11  2010-4  volume_month_3    140
    12  2011-1  volume_month_1    150
    13  2011-1  volume_month_2    160
    14  2011-1  volume_month_3    140