I have the following data-frame (read from a csv file):
my_df:
my_date my_id values key factor
1/1/2024 _One 123 key1 .56
1/7/2024 _One 567 key1 .75
1/14/2024 _One 100 key1 .81
1/14/2024 _One 100 key2 .44
1/1/2024 _Two 150 key3 .91
1/7/2024 _Two 130 key3 .88
1/1/2024 _Three 200 key4 0
1/1/2024 _Three 200 key5 .45
So, there is an overlap of certain dates for two or more keys belonging to the same 'id'. What i want my data-frame to look like is as follows, that is, i need to calculate the allocated values based on the factor weights. Note: the calculated weight is obtained by dividing the factor by the sum of the factors in the overlapping periods. Say,
my_df:
my_date my_id values key factor weights allocated_values
1/1/2024 _One 123 key1 0.56 1 123
1/7/2024 _One 500 key1 0.75 1 500
1/14/2024 _One 100 key1 0.81 0.648 64.8
1/14/2024 _One 100 key2 0.44 0.352 35.2
1/1/2024 _Two 160 key3 0.91 1 160
1/7/2024 _Two 130 key3 0.88 1 130
1/1/2024 _Three 200 key4 0 0.50 100
1/1/2024 _Three 200 key5 0.45 0.50 100
To achieve the above result, i am doing the following group by:
for name, group in my_df.groupby('my_id'):
for name1, group1 in group.groupby('key'):
factors = group1['factor']
weight = factors['factor']/factors.sum() if factors.sum() != 0 | (factors==0).any() else 1/len(factors)
#what i tried- approach1
group['weights'] = weight #doesn't work
#what i tried next
my_df['weights'] = my_df.update(group) #doesn't work
I am so tired now, unable to think any further. So posting it here for any help/guidance.
Would much appreciate any hints.
You could try to use .groupby
(over my_id
and my_date
) in combination with .transform
to get that done:
df["allocated_values"] = (
df["values"] * df["factor"]
/ df.groupby(["my_id", "my_date"])["factor"].transform("sum")
)
Result for the sample:
my_date my_id values key factor allocated_values
0 1/1/2024 _One 123 key1 0.56 123.0
1 1/7/2024 _One 567 key1 0.75 567.0
2 1/14/2024 _One 100 key1 0.81 64.8
3 1/14/2024 _One 100 key2 0.44 35.2
4 1/1/2024 _Two 150 key3 0.91 150.0
5 1/7/2024 _Two 130 key3 0.88 130.0
If you want to keep the weights
then you could try:
df = (
df.assign(weights=
df["factor"] / df.groupby(["my_id", "my_date"])["factor"].transform("sum"))
.assign(allocated_values=lambda df: df["values"] * df["weights"])
)
Result for the sample:
my_date my_id values key factor allocated_values weights
0 1/1/2024 _One 123 key1 0.56 123.0 1.000
1 1/7/2024 _One 567 key1 0.75 567.0 1.000
2 1/14/2024 _One 100 key1 0.81 64.8 0.648
3 1/14/2024 _One 100 key2 0.44 35.2 0.352
4 1/1/2024 _Two 150 key3 0.91 150.0 1.000
5 1/7/2024 _Two 130 key3 0.88 130.0 1.000
Regarding the modification: I'm not sure that I completly understand the specification, but you could try to modify the factor
s of those groups that contain a 0
by setting it to 1
, so the resprective weights would be uniform (1 / length of group
):
df = (
df.assign(mask=df["factor"].ne(0))
.assign(mask=lambda df: df.groupby(["my_id", "my_date"])["mask"].transform("all"))
.assign(factor_alt=lambda df: df["factor"].where(df["mask"], 1))
.assign(weights=lambda df: df["factor_alt"]
/ df.groupby(["my_id", "my_date"])["factor_alt"].transform("sum"))
.assign(allocated_values=lambda df: df["values"] * df["weights"])
.drop(columns=["mask", "factor_alt"])
)
The first 2 steps identify the groups in which a 0 occurs, and the 3. step adjusts the factor
s accordingly. Then it's the same as before, just with the adjusted factor
s.
I'm getting the following result with the modified input:
my_date my_id values key factor allocated_values weights
0 1/1/2024 _One 123 key1 0.56 123.0 1.000
1 1/7/2024 _One 567 key1 0.75 567.0 1.000
2 1/14/2024 _One 100 key1 0.81 64.8 0.648
3 1/14/2024 _One 100 key2 0.44 35.2 0.352
4 1/1/2024 _Two 150 key3 0.91 150.0 1.000
5 1/7/2024 _Two 130 key3 0.88 130.0 1.000
6 1/1/2024 _Three 200 key4 0.00 100.0 0.500
7 1/1/2024 _Three 200 key5 0.45 100.0 0.500