Search code examples
pythonpandassum

Calculating daily sum as a separate column for each day


I think this should be a relatively simple question, I'm just not sure how to do it. I have two dataframes. The first dataframe includes positions for each day and each day includes multiple positions.

DF1:

Date InstrumentID DollarAmt
6/17/2024 ABC 1237710.88
6/17/2024 DEF 3785016.6
6/17/2024 GHI 8073892.08
6/17/2024 JKL 1239084.56
6/17/2024 MNO 1981981.96
6/17/2024 PQR 741914.0625
6/17/2024 STU 482246.0938
6/17/2024 VWX 747128.9063
6/17/2024 YZ 6231908.57
6/18/2024 ABC 1223553.78
6/18/2024 DEF 3742955.56
6/18/2024 GHI 8055188.67
6/18/2024 JKL 1228704.36
6/18/2024 MNO 1961685.05
6/18/2024 PQR 742089.8438
6/18/2024 STU 482353.5156
6/18/2024 VWX 747246.0938
6/18/2024 YZ 6231908.57
6/18/2024 ACD 94344.13

I also have another dataframe, where I've separately calculated daily totals of the DollarAmt column from the first dataframe.

DF2:

Date Total
6/17/2024 24520883.71
6/18/2024 24510029.57

What I'm looking for is the following, where a separate column in the first dataframe is created and displays the daily totals from the second dataframe, as below. I'm still new to python so the best way I could come up with to do this was to create a separate dataframe with the totals and then merge. But that end output isn't exactly what I'm looking for. The desired result is below:

Desired Output:

Date InstrumentID DollarAmt Total
6/17/2024 ABC 1237710.88 24520883.71
6/17/2024 DEF 3785016.6 24520883.71
6/17/2024 GHI 8073892.08 24520883.71
6/17/2024 JKL 1239084.56 24520883.71
6/17/2024 MNO 1981981.96 24520883.71
6/17/2024 PQR 741914.0625 24520883.71
6/17/2024 STU 482246.0938 24520883.71
6/17/2024 VWX 747128.9063 24520883.71
6/17/2024 YZ 6231908.57 24520883.71
6/18/2024 ABC 1223553.78 24510029.57
6/18/2024 DEF 3742955.56 24510029.57
6/18/2024 GHI 8055188.67 24510029.57
6/18/2024 JKL 1228704.36 24510029.57
6/18/2024 MNO 1961685.05 24510029.57
6/18/2024 PQR 742089.8438 24510029.57
6/18/2024 STU 482353.5156 24510029.57
6/18/2024 VWX 747246.0938 24510029.57
6/18/2024 YZ 6231908.57 24510029.57
6/18/2024 ACD 94344.13 24510029.57

Thanks very much for any and all help, it's appreciated.


Solution

  • You can merge df1 and df2 with how='left':

    df1 = df1.merge(df2, how="left")
    

    But you don't need to calculate the sums separately on df2. You can do it directly on df1, using groupby on 'Date', then transform with 'sum':

    df1["Total"] = df1.groupby("Date")["DollarAmt"].transform("sum")
    

    Both will output (using pd.set_option("display.float_format", "{:.2f}".format) to prevent scientific notation):

             Date InstrumentID  DollarAmt       Total
    0   6/17/2024          ABC 1237710.88 24520883.71
    1   6/17/2024          DEF 3785016.60 24520883.71
    2   6/17/2024          GHI 8073892.08 24520883.71
    3   6/17/2024          JKL 1239084.56 24520883.71
    4   6/17/2024          MNO 1981981.96 24520883.71
    5   6/17/2024          PQR  741914.06 24520883.71
    6   6/17/2024          STU  482246.09 24520883.71
    7   6/17/2024          VWX  747128.91 24520883.71
    8   6/17/2024           YZ 6231908.57 24520883.71
    9   6/18/2024          ABC 1223553.78 24510029.57
    10  6/18/2024          DEF 3742955.56 24510029.57
    11  6/18/2024          GHI 8055188.67 24510029.57
    12  6/18/2024          JKL 1228704.36 24510029.57
    13  6/18/2024          MNO 1961685.05 24510029.57
    14  6/18/2024          PQR  742089.84 24510029.57
    15  6/18/2024          STU  482353.52 24510029.57
    16  6/18/2024          VWX  747246.09 24510029.57
    17  6/18/2024           YZ 6231908.57 24510029.57
    18  6/18/2024          ACD   94344.13 24510029.57