Search code examples
pythonpandasmulti-index

How to insert missing rows in multi-index


I have a dataframe that looks something like this

df = pd.DataFrame({'year':[23,23,23,23,23,23], 'month':[1,1,1,2,3,3], 'utility':['A','A','B','A','A','B'], 'state':['NY','NJ','CA','NJ','NY','CA']})

   year  month utility state
0    23      1       A    NY
1    23      1       A    NJ
2    23      1       B    CA
3    23      2       A    NJ
4    23      3       A    NY
5    23      3       B    CA

And I would like to create new rows for utilities-state combinations with missing months. So the new dataframe would look something like this

   year  month utility state
0    23      1       A    NY
1    23      1       A    NJ
2    23      1       B    CA
3    23      2       A    NY
4    23      2       A    NJ
5    23      2       B    CA
6    23      3       A    NY
7    23      3       A    NJ
8    23      3       B    CA

I know that I can use a MultiIndex and then reindex, but using the from_product() method results in utility-state combinations not present in the original df (I do not want a utility A - CA combination, for example).

I thought about concatenating the utility and state columns and then getting the cartesian product from that, but I think there must be a simpler method.


Solution

  • One option is with DataFrame.complete from . For your data, you are basically doing a combination of (year, month) and (utility, state):

    # pip install pyjanitor
    import janitor 
        ​
    df.complete(('year', 'month'), ('utility', 'state'))
       year  month utility state
    0    23      1       A    NY
    1    23      1       A    NJ
    2    23      1       B    CA
    3    23      2       A    NY
    4    23      2       A    NJ
    5    23      2       B    CA
    6    23      3       A    NY
    7    23      3       A    NJ
    8    23      3       B    CA
    

    @Timeless, undelete your code and I'll delete mine. You had a good start, and I edited your code to make it simpler.