Search code examples
pythonpandassortingdata-manipulation

Calculating the amount of the previous transaction of a different category for each transaction in a DataFrame


I have a DataFrame df that contains transactions from customers. Each row in the DataFrame represents a transaction and has the following columns:

  • KEY_ID: the ID of the customer who made the transaction.
  • TYPE: the type of transaction, which can be of two categories.
  • DATE: the date the transaction was made.
  • AMOUNT: the amount of the transaction.

I need to add a new column to the DataFrame that contains the amount of the previous transaction of the opposite category for each transaction. That is, for each transaction of a certain type, I want to know how much was the amount of the last transaction of a different type that the same customer made before the date of the current transaction.

First I sorted the df by KEY_ID and DATE:

df= df.sort_values(by=['KEY_ID', 'FECHA'])

Then tried to create a function that creates a subset called previous_requests

def find_previous_request(row):
    # Get previous requests of the opposite type
    previous_requests = df[
        (df['KEY_ID'] == row['KEY_ID']) &
        (df['TYPE'] != row['TYPE']) &
        (df['DATE'] < row['DATE'])
    ]
    
    # If there are previous requests, return the amount of the most recent one
    if not previous_requests.empty:
        return previous_requests.iloc[-1]['AMOUNT']
    
    # If there are no previous requests, return NaN
    return np.nan

The kernel crashes when I apply the function.

Also tried the following code:

df['prev_amount'] = df.groupby(['KEY_ID', 'TYPE'])['AMOUNT'].shift().fillna(0).astype(int)

EDIT:

Input:

KEY_ID TYPE AMOUNT DATE
1 Motor 5000 2020-01-01
1 Tool 3000 2020-02-01
1 Tool 7000 2020-03-01
2 Tool 2000 2020-01-15
2 Motor 6000 2020-02-15
2 Tool 4000 2020-03-15

Expected Output:

KEY_ID TYPE AMOUNT DATE PREV_AMOUNT
1 Motor 5000 2020-01-01 NaN
1 Tool 3000 2020-02-01 5000
1 Tool 7000 2020-03-01 5000
2 Tool 2000 2020-01-15 NaN
2 Motor 6000 2020-02-15 2000
2 Tool 4000 2020-03-15 6000

Solution

  • based on your sample data, I guess @Michael Cao has pointed out 99% of the answer, his answer misses the KEY_ID, since you don't want to have the amounts of the KEY_ID 1 in KEY_ID 2:

    grouped = df.groupby(['KEY_ID'])
    
    for name, group in grouped:
        last_motor = None
        last_tool = None
        for ind, row in group.sort_values(['KEY_ID', 'DATE']).iterrows():
            if row.TYPE == 'Motor':
                df.loc[ind, 'PREV_AMOUNT'] = last_tool
                last_motor = row.AMOUNT
            if row.TYPE == 'Tool':
                df.loc[ind, 'PREV_AMOUNT'] = last_motor
                last_tool = row.AMOUNT
    

    I hope this helps!