Search code examples
pythonpandasgroup-by

Some manipulations with groupby Pandas


I have this Dataframe

import pandas as pd
import math
from pandas import Timestamp

Date = [Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:47'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49')]
Price = [0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.673, 0.6728, 0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
Side = [-1, -1, -1, 1, -1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1]
Amount = [1579.2963000000002, 7.400799999999999, 6.728, 177.61919999999998, 797.2679999999999, 33650.0, 131.196, 48.448800000000006, 0.6728, 0.6728, 0.6728, 6.728, 0.6728, 1.3456, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
buy = [math.nan, math.nan, math.nan, 177.61919999999998, math.nan, 33650.0, math.nan, 48.448800000000006, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, 49.121700000000004, math.nan]

df = pd.DataFrame({
    'Date':Date,
    'Price':Price,
    'Side':Side,
    'Amount':Amount,
    'buy':buy
})

print(df)

I got buy column using

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()

But I want to get 0 in the buy column instead of nan values, if this price has not yet been met in the group or the previous value of the cumulative sum

Result buy column need - [0,0,0,177.6192,177.6192,33650, 177.6192,48.4488, 177.6192,.....]

How can I implement this?


Solution

  • You could reindex, ffill and fillna:

    df['buy'] = (df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
                 .reindex(df.index).ffill().fillna(0)
                )
    

    Or in two steps:

    df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
    df['buy'] = df['buy'].ffill().fillna(0)
    

    Output:

                      Date   Price  Side      Amount         buy
    0  2024-03-16 23:59:42  0.6729    -1   1579.2963      0.0000
    1  2024-03-16 23:59:42  0.6728    -1      7.4008      0.0000
    2  2024-03-16 23:59:44  0.6728    -1      6.7280      0.0000
    3  2024-03-16 23:59:44  0.6728     1    177.6192    177.6192
    4  2024-03-16 23:59:44  0.6728    -1    797.2680    177.6192
    5  2024-03-16 23:59:47  0.6730     1  33650.0000  33650.0000
    6  2024-03-16 23:59:48  0.6728    -1    131.1960  33650.0000
    7  2024-03-16 23:59:48  0.6729     1     48.4488     48.4488
    8  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    9  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    10 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    11 2024-03-16 23:59:49  0.6728    -1      6.7280     48.4488
    12 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    13 2024-03-16 23:59:49  0.6728    -1      1.3456     48.4488
    14 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    15 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    16 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    17 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
    18 2024-03-16 23:59:49  0.6729     1      0.6729     49.1217
    19 2024-03-16 23:59:49  0.6728    -1      0.6728     49.1217
    

    Alternatively, if you want to fill with the first valid non-NA you could use:

    df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
    m = df['buy'].notna()
    df['buy'] = df['buy'].fillna(df.loc[df['buy'].first_valid_index(), 'buy']).where(m.cummax(), 0)
    

    Output:

                      Date   Price  Side      Amount         buy
    0  2024-03-16 23:59:42  0.6729    -1   1579.2963      0.0000
    1  2024-03-16 23:59:42  0.6728    -1      7.4008      0.0000
    2  2024-03-16 23:59:44  0.6728    -1      6.7280      0.0000
    3  2024-03-16 23:59:44  0.6728     1    177.6192    177.6192
    4  2024-03-16 23:59:44  0.6728    -1    797.2680    177.6192
    5  2024-03-16 23:59:47  0.6730     1  33650.0000  33650.0000
    6  2024-03-16 23:59:48  0.6728    -1    131.1960    177.6192
    7  2024-03-16 23:59:48  0.6729     1     48.4488     48.4488
    8  2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    9  2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    10 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    11 2024-03-16 23:59:49  0.6728    -1      6.7280    177.6192
    12 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    13 2024-03-16 23:59:49  0.6728    -1      1.3456    177.6192
    14 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    15 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    16 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    17 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
    18 2024-03-16 23:59:49  0.6729     1      0.6729     49.1217
    19 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192