Search code examples
pythonpandasreindex

DataFrame balancing by using reindex


I have a DataFrame with an index called _ItemId and a _ChannelId column, plus other value columns simplified below as _SomeValue. I would like to transform the DataFrame to have exactly the same _ItemId elements for each _Channel modality (which I refer to as "balancing", like in panel data).

In the example below, in the df that I want, 3 rows were inserted with Nan values to balance the DataFrame.

I thought I would use the reindex() function but I am uncertain on how to create the index I need to insert the missing elements. Any suggestions much appreciated.

What I have:

       _ChannelId  _SomeValue
_ItemId                 
6559085   MICRO    AA
6589413   MICRO    AB
6589421   MICRO    AA
6781144   MICRO    AC
8184089   MICRO    AA
6559085  WILSON    AA
6589413  WILSON    AC

What I want:

       _ChannelId  _SomeValue
_ItemId                 
6559085   MICRO    AA
6589413   MICRO    AB
6589421   MICRO    AA
6781144   MICRO    AC
8184089   MICRO    AA
6559085  WILSON    AA
6589413  WILSON    AC
6589421  WILSON    NaN
6781144  WILSON    NaN
8184089  WILSON    NaN

Thanks


Solution

  • First get the indexes of the group you want to take as reference, for example:

    idx = df.groupby('_ChannelId').groups['MICRO']
    

    May be you want the union, unique indexes, indexes of the first group, etc, it is not clearly specified in the question. Then you can do:

    df.groupby('_ChannelId')\
      .apply(lambda x: x.reindex(idx))\
      .drop('_ChannelId', axis=1)\
      .reset_index('_ChannelId')
    
            _ChannelId  _SomeValue
    _ItemId     
    6559085 MICRO      AA
    6589413 MICRO      AB
    6589421 MICRO      AA
    6781144 MICRO      AC
    8184089 MICRO      AA
    6559085 WILSON     AA
    6589413 WILSON     AC
    6589421 WILSON     NaN
    6781144 WILSON     NaN
    8184089 WILSON     NaN