Search code examples
pythonpandastrading

how to group by trade and not by time


                                   open     high     low       close    Volume
ctime                                                   
2022-11-07 01:00:00.012000+01:00  20900.0  20900.0  20900.0  20900.0     209.00
2022-11-07 01:00:00.019000+01:00  20900.1  20900.1  20900.1  20900.1    1254.00
2022-11-07 01:00:00.111000+01:00  20900.0  20900.0  20900.0  20900.0   11390.50
2022-11-07 01:00:00.188000+01:00  20900.1  20900.1  20900.1  20900.1      20.90
2022-11-07 01:00:00.194000+01:00  20900.0  20900.0  20900.0  20900.0    4995.10
2022-11-07 01:00:00.203000+01:00  20900.0  20900.0  20900.0  20900.0     209.00
2022-11-07 01:00:00.217000+01:00  20900.0  20900.0  20900.0  20900.0  239827.50
2022-11-07 01:00:00.237000+01:00  20900.0  20900.0  20900.0  20900.0      20.90
2022-11-07 01:00:03.028000+01:00  20900.0  20900.0  20900.0  20900.0      20.90
2022-11-07 01:00:03.743000+01:00  20900.0  20900.0  20900.0  20900.0      41.80

I need to gather in periods of 5 trades, that is to say that the open is the first row, that the close is the 5th trade and the minimum and the maximum is the corresponding minimum and maximum of the 5 trades, the same for the volume. there should be only 2 rows left in the exposed example once the transformation is done.

I have tried searching pandas for an approximation but I have not found a satisfactory solution


Solution

  • Define a new series that put the trades into groups of 5, then aggregate per group:

    trade_group = np.arange(len(df)) // 5
    
    # The `open` of the group is the `open` of the first trade in the group
    # `high` is max of the `high` in the group
    # and so on...
    df.groupby(trade_group).agg({
        "open": "first",
        "high": "max",
        "low": "min",
        "close": "last",
        "Volume": "sum"
    })