I have a MultiIndex (Name
, Date
) DataFrame df
that I need to process iteratively by Date
in order to assign a value that is based on both the current and previous Date's Group.
I can't tell how to do this when I have different Name
sets on different dates. I've marked the problematic row in the loop below with the comment HELP!!!
:
df = pd.DataFrame(data=[('A', '20210101', 5.0), # 1 Jan
('B', '20210101', 3.0),
('C', '20210101', 2.0),
('A', '20210102', 0.0), # 2 Jan
('C', '20210102', 0.0),
('A', '20210103', 0.0), # 3 Jan
('C', '20210103', 0.0),
('D', '20210103', 0.0)],
columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])
# Logic: Each day total wealth sums to $10.
# Each day: Each person starts with what he had the previous day;
# excess wealth gets allocated evenly to everyone
dft = df.groupby(df.index.get_level_values('Date'))
dates = list(dft.groups.keys())
# Initialize first group:
previous = dft.get_group(dates[0])
# Loop over groups in order:
for date in dates[1:]:
current = dft.get_group(date)
current.Dollars = previous.Dollars # << HELP!!!
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration:
previous = current
At the HELP!!!
point I need to somehow accomplish the following:
Do a left-join-like assignment of current.Dollars
to previous.Dollars
, disregarding the Date
index on current
and previous
. Even if I do a .reset_index(level=1)
I don't see how I can accomplish this assignment given the fact that the Name
indexes can vary each day – Note that in the sample df
the second day loses row Name B
, and the third day there is a new row Name D
.
After disregarding the Date
level while doing the logic in the loop, I need to somehow recover it to assign the results back to the master df
on the corresponding Date
.
Dollars
Name Date
A 20210101 5.0
B 20210101 3.0
C 20210101 2.0
A 20210102 6.5
C 20210102 3.5
A 20210103 6.5
C 20210103 3.5
D 20210103 0.0
As shown in a reduced case here, you have to manipulate the indexes in order to do the assignments and then get the values back into the original DataFrame:
# Initialize first group, but take Date out of its index:
previous = dft.get_group(dates[0]).reset_index(level=1)
for date in dates[1:]:
# Take the Date out of the index:
current = dft.get_group(date).reset_index(level=1)
# Get the common keys
commonKeys = current.index.intersection(previous.index)
# Do the "left assignment"
current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars
# Do your calculations:
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Restore the Date column to the index:
current.set_index('Date', append=True, inplace=True)
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration by removing the Date index again:
previous = current.reset_index(level=1)