Search code examples
pythonpandasfinancefifo

Pandas: Drop rows at beginning of groups if a column equals certain string value ('Sell' or 'Buy')


To clarify, the 'group' in the title is not a result of pd.groupby. Instead, I meant it as rows that share the same values of certain columns. In my case it would be account and symbol.

I am trying to calculate profits&loss by account and position from trade data on a First-in, first-out (FIFO). As a result, when cumulative share quantities drops below zero, namely, when the most recent sell shares is bigger than all the buy shares prior combined, I need to reset it to 0. Same deal when the trade data begins with sell records.

I am trying to design a cumulative sum which will reset to 0 to help with the process. What I have is:

    def cumsum_with_reset(group):
        cumulative_sum = 0
        group['reset_cumsum'] = 0
        for index, row in group.iterrows():
            cumulative_sum += row['Modified_Quantity']
            if cumulative_sum < 0:
                cumulative_sum = 0
            group.loc[index, 'reset_cumsum'] = cumulative_sum
        return group

This function can return 0 if a group, namely the rows with the same account and symbol, begins with sell records. However, the problem is iterrows is so inefficient that it takes forever for large amount of data so I want to create a new function but I am stuck at the very first step: how to remove the sell rows in each group before the buy rows?

Using some sample data:

pd.DataFrame(data = [['2022-01-01', 'foo', 'AMZN', 'buy', 10, 22],
 ['2022-01-02', 'foo', 'AMZN', 'sell', 15, 24],
 ['2022-01-03', 'cat', 'FB', 'sell', 5, 12],
 ['2022-01-04', 'cat', 'FB', 'buy', 17, 15],
 ['2022-01-05', 'cat', 'FB', 'sell', 15, 13],
 ['2022-01-06', 'bar', 'AAPL', 'buy', 10, 10],
 ['2022-01-07', 'bar', 'AAPL', 'buy', 5, 12],
 ['2022-01-08', 'bar', 'AAPL', 'sell', 8, 12],
 ['2022-01-09', 'bar', 'AAPL', 'sell', 12, 14],
 ['2022-01-10', 'dog', 'GOOG', 'sell', 20, 13],
 ['2022-01-11', 'dog', 'GOOG', 'buy', 15, 13],
 ['2022-01-12', 'dog', 'GOOG', 'buy', 5, 13],
 ['2022-01-13', 'dog', 'GOOG', 'sell', 7, 14]], columns = ['Date', 'account', 'symbol', 'Action', 'Quantity', 'Price'])

which looks like this:

sample

There are 4 groups in this dataset:

grouped

2nd and 4th group start with sell records, row 2 and row 9. How can I use Pandas to get rid of such records until each group starts with buy records?


Solution

  • It's pretty trivial if there no more than one sell in the beginning of a group:

    # assuming df is sorted by symbol + date
    df.loc[(df['symbol'] == df['symbol'].shift()) | (df['Action'] != 'sell')]
    

    If multiple consequtive sells are to be removed, we'll need to track state of the previous row removal:

    last = False
    df.loc[[not (last := (action == 'sell' and (last or current != prev)))  
            for action, current, prev  
            in zip(df['Action'], df['symbol'], df['symbol'].shift())]]