Search code examples
pythonpandaspanel-data

How to balance panel data by adding missing rows with no information?


I have an unbalanced dataset, namely unbalanced.df that looks as follows:

Date ID City State Quantity
2019-01 10001 Los Angeles CA 500
2019-02 10001 Los Angeles CA 995
2019-03 10001 Los Angeles CA 943
2019-01 10002 Houston TX 4330
2019-03 10002 Houston TX 2340
2019-01 10003 Sacramento CA 235
2019-02 10003 Sacramento CA 239
2019-03 10003 Sacramento CA 233

As you can see, Houston does not have 2019-02 as a Date. This happens all throughout my panel data with different cities.

I want to make this panel symmetric by adding NA rows on the missing dates, such that the new panel data looks like this, balanced.df:

Date ID City State Quantity
2019-01 10001 Los Angeles CA 500
2019-02 10001 Los Angeles CA 995
2019-03 10001 Los Angeles CA 943
2019-01 10002 Houston TX 4330
2019-02 10002 Houston TX NaN
2019-03 10002 Houston TX 2340
2019 -01 10003 Sacramento CA 235
2019-02 10003 Sacramento CA 239
2019-03 10003 Sacramento CA 233

In this case, I have an absolute minimum date and absolute maximum date, so I want to make sure that all cities follow the same dates. How can I fill my panel with NaN rows for cities and have therefore the same number of rows for each ID, City and State?


Solution

  • One option, that offers an efficient abstraction, is with complete from pyjanitor to get missing rows for the combination of Date vs the group of ('ID', 'City', 'State'):

    # pip install pyjanitor
    import pandas as pd
    import janitor
    
    df.complete(('ID', 'City', 'State'), 'Date')
    
          Date     ID         City State  Quantity
    0  2019-01  10001  Los Angeles    CA     500.0
    1  2019-02  10001  Los Angeles    CA     995.0
    2  2019-03  10001  Los Angeles    CA     943.0
    3  2019-01  10002      Houston    TX    4330.0
    4  2019-02  10002      Houston    TX       NaN
    5  2019-03  10002      Houston    TX    2340.0
    6  2019-01  10003   Sacramento    CA     235.0
    7  2019-02  10003   Sacramento    CA     239.0
    8  2019-03  10003   Sacramento    CA     233.0