I think this should be a relatively simple question, I'm just not sure how to do it. I have two dataframes. The first dataframe includes positions for each day and each day includes multiple positions.
DF1:
Date | InstrumentID | DollarAmt |
---|---|---|
6/17/2024 | ABC | 1237710.88 |
6/17/2024 | DEF | 3785016.6 |
6/17/2024 | GHI | 8073892.08 |
6/17/2024 | JKL | 1239084.56 |
6/17/2024 | MNO | 1981981.96 |
6/17/2024 | PQR | 741914.0625 |
6/17/2024 | STU | 482246.0938 |
6/17/2024 | VWX | 747128.9063 |
6/17/2024 | YZ | 6231908.57 |
6/18/2024 | ABC | 1223553.78 |
6/18/2024 | DEF | 3742955.56 |
6/18/2024 | GHI | 8055188.67 |
6/18/2024 | JKL | 1228704.36 |
6/18/2024 | MNO | 1961685.05 |
6/18/2024 | PQR | 742089.8438 |
6/18/2024 | STU | 482353.5156 |
6/18/2024 | VWX | 747246.0938 |
6/18/2024 | YZ | 6231908.57 |
6/18/2024 | ACD | 94344.13 |
I also have another dataframe, where I've separately calculated daily totals of the DollarAmt column from the first dataframe.
DF2:
Date | Total |
---|---|
6/17/2024 | 24520883.71 |
6/18/2024 | 24510029.57 |
What I'm looking for is the following, where a separate column in the first dataframe is created and displays the daily totals from the second dataframe, as below. I'm still new to python so the best way I could come up with to do this was to create a separate dataframe with the totals and then merge. But that end output isn't exactly what I'm looking for. The desired result is below:
Desired Output:
Date | InstrumentID | DollarAmt | Total |
---|---|---|---|
6/17/2024 | ABC | 1237710.88 | 24520883.71 |
6/17/2024 | DEF | 3785016.6 | 24520883.71 |
6/17/2024 | GHI | 8073892.08 | 24520883.71 |
6/17/2024 | JKL | 1239084.56 | 24520883.71 |
6/17/2024 | MNO | 1981981.96 | 24520883.71 |
6/17/2024 | PQR | 741914.0625 | 24520883.71 |
6/17/2024 | STU | 482246.0938 | 24520883.71 |
6/17/2024 | VWX | 747128.9063 | 24520883.71 |
6/17/2024 | YZ | 6231908.57 | 24520883.71 |
6/18/2024 | ABC | 1223553.78 | 24510029.57 |
6/18/2024 | DEF | 3742955.56 | 24510029.57 |
6/18/2024 | GHI | 8055188.67 | 24510029.57 |
6/18/2024 | JKL | 1228704.36 | 24510029.57 |
6/18/2024 | MNO | 1961685.05 | 24510029.57 |
6/18/2024 | PQR | 742089.8438 | 24510029.57 |
6/18/2024 | STU | 482353.5156 | 24510029.57 |
6/18/2024 | VWX | 747246.0938 | 24510029.57 |
6/18/2024 | YZ | 6231908.57 | 24510029.57 |
6/18/2024 | ACD | 94344.13 | 24510029.57 |
Thanks very much for any and all help, it's appreciated.
You can merge
df1
and df2
with how='left'
:
df1 = df1.merge(df2, how="left")
But you don't need to calculate the sums separately on df2
. You can do it directly on df1
, using groupby
on 'Date'
, then transform
with 'sum'
:
df1["Total"] = df1.groupby("Date")["DollarAmt"].transform("sum")
Both will output (using pd.set_option("display.float_format", "{:.2f}".format)
to prevent scientific notation):
Date InstrumentID DollarAmt Total
0 6/17/2024 ABC 1237710.88 24520883.71
1 6/17/2024 DEF 3785016.60 24520883.71
2 6/17/2024 GHI 8073892.08 24520883.71
3 6/17/2024 JKL 1239084.56 24520883.71
4 6/17/2024 MNO 1981981.96 24520883.71
5 6/17/2024 PQR 741914.06 24520883.71
6 6/17/2024 STU 482246.09 24520883.71
7 6/17/2024 VWX 747128.91 24520883.71
8 6/17/2024 YZ 6231908.57 24520883.71
9 6/18/2024 ABC 1223553.78 24510029.57
10 6/18/2024 DEF 3742955.56 24510029.57
11 6/18/2024 GHI 8055188.67 24510029.57
12 6/18/2024 JKL 1228704.36 24510029.57
13 6/18/2024 MNO 1961685.05 24510029.57
14 6/18/2024 PQR 742089.84 24510029.57
15 6/18/2024 STU 482353.52 24510029.57
16 6/18/2024 VWX 747246.09 24510029.57
17 6/18/2024 YZ 6231908.57 24510029.57
18 6/18/2024 ACD 94344.13 24510029.57