Search code examples
pandasdataframemulti-index

Iterate over multiindex dataframe groups with unequal second index level


I have a MultiIndex (Name, Date) DataFrame df that I need to process iteratively by Date in order to assign a value that is based on both the current and previous Date's Group.

I can't tell how to do this when I have different Name sets on different dates. I've marked the problematic row in the loop below with the comment HELP!!!:

    df = pd.DataFrame(data=[('A', '20210101', 5.0),  # 1 Jan
                            ('B', '20210101', 3.0),
                            ('C', '20210101', 2.0),
                            ('A', '20210102', 0.0),  # 2 Jan
                            ('C', '20210102', 0.0),
                            ('A', '20210103', 0.0),  # 3 Jan
                            ('C', '20210103', 0.0),
                            ('D', '20210103', 0.0)],
                      columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])

    # Logic: Each day total wealth sums to $10.
    # Each day: Each person starts with what he had the previous day; 
    #           excess wealth gets allocated evenly to everyone
    
    dft = df.groupby(df.index.get_level_values('Date'))
    dates = list(dft.groups.keys())
    # Initialize first group:
    previous = dft.get_group(dates[0])

    # Loop over groups in order:
    for date in dates[1:]:
        current = dft.get_group(date)
        current.Dollars = previous.Dollars  # << HELP!!!
        excess = 10.0 - current.Dollars.sum()
        current.Dollars = current.Dollars + excess / current.Dollars.count()
        # Assign the calculated values back to the DataFrame:
        df.loc[current.index] = current
        # Prepare for next iteration:
        previous = current

At the HELP!!! point I need to somehow accomplish the following:

  1. Do a left-join-like assignment of current.Dollars to previous.Dollars, disregarding the Date index on current and previous. Even if I do a .reset_index(level=1) I don't see how I can accomplish this assignment given the fact that the Name indexes can vary each day – Note that in the sample df the second day loses row Name B, and the third day there is a new row Name D.

  2. After disregarding the Date level while doing the logic in the loop, I need to somehow recover it to assign the results back to the master df on the corresponding Date.


In this example the value of `df` after the loop finishes should be:
               Dollars
Name Date             
A    20210101      5.0
B    20210101      3.0
C    20210101      2.0
A    20210102      6.5
C    20210102      3.5
A    20210103      6.5
C    20210103      3.5
D    20210103      0.0

Solution

  • As shown in a reduced case here, you have to manipulate the indexes in order to do the assignments and then get the values back into the original DataFrame:

    # Initialize first group, but take Date out of its index:
    previous = dft.get_group(dates[0]).reset_index(level=1)
    
    for date in dates[1:]:
        # Take the Date out of the index:
        current = dft.get_group(date).reset_index(level=1)
    
        # Get the common keys
        commonKeys = current.index.intersection(previous.index)
    
        # Do the "left assignment"
        current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars
    
        # Do your calculations:
        excess = 10.0 - current.Dollars.sum()
        current.Dollars = current.Dollars + excess / current.Dollars.count()
    
        # Restore the Date column to the index:
        current.set_index('Date', append=True, inplace=True)
    
        # Assign the calculated values back to the DataFrame:
        df.loc[current.index] = current
    
        # Prepare for next iteration by removing the Date index again:
        previous = current.reset_index(level=1)