Search code examples
pythonpandasdataframetransactions

Copy rows to new dataframe based on change in date column and transaction


I've checked several websites and StackOverflow, but I wasn't able to find any documentation or help around this topic, so hoping that someone in the community might know the answer. I have a list of groceries, quantity, transaction (B for buy and S for sell) and date purchased/sold:

TRANSACTIONS    Qty Transaction Date
Apple   5   B   1/1/22
Banana  5   B   1/1/22
Cereal  3   B   1/1/22
Milk    4   B   1/1/22
Banana  2   S   2/2/22
Milk    1   S   2/2/22
Toy     4   B   2/2/22

Grocery Transactions

What I would like to do is generate a new dataframe that incorporates the transactions to show the quantity remaining based on the transaction and each date there was a transaction:

BALANCE Qty Date
Apple   5   1/1/22
Banana  5   1/1/22
Cereal  3   1/1/22
Milk    4   1/1/22
Apple   5   2/2/22
Banana  3   2/2/22
Cereal  3   2/2/22
Milk    3   2/2/22
Toy     4   2/2/22

Grocery Balance


Solution

  • Here's a one-liner solution (formatted):

    new_df = (
        df.pivot(index='TRANSACTIONS', columns='Date')
        .pipe(lambda x: x.assign(Qty=x.Qty.fillna(0), Transaction=x.Transaction.ffill(axis=1)))
        .stack()
        .sort_index(level=1)
        .reset_index()
        .dropna()
        .pipe(lambda x: x.assign(Qty=x['Qty'].where(x['Transaction'] == 'B', -x['Qty'])))
        .pipe(lambda x: x.assign(Qty=x.groupby('TRANSACTIONS')['Qty'].cumsum().astype(int)))
        .drop(['Transaction'], axis=1)
        .rename({'TRANSACTIONS': 'BALANCE'}, axis=1)
    )
    

    Output:

    >>> new_df
      BALANCE    Date  Qty
    0   Apple  1/1/22    5
    1  Banana  1/1/22    5
    2  Cereal  1/1/22    3
    3    Milk  1/1/22    4
    5   Apple  2/2/22    5
    6  Banana  2/2/22    3
    7  Cereal  2/2/22    3
    8    Milk  2/2/22    3
    9     Toy  2/2/22    4