Search code examples
pythonpandasdataframeshiftcumsum

How to apply multiple conditions for columns in a dataframe?


I have a data frame that looks as shown below and there is no relationship between the two columns

item col1
cat 1
cat 1
dog 1
fish 1
fish 1
fish 2
snake 2
snake 2
snake 2

I want to generate a new column using these two. The resulting column should do a cumulative sum for the items column (increments number only when the item changes) and also the number must start from 1 again if the series in col1 has a change.

item col1 result
cat 1 1
cat 1 1
dog 1 2
fish 1 3
fish 1 3
fish 2 1
snake 2 2
snake 2 2
snake 2 2

I was able to do the cumulative sum with the below code,

df["result"] = (df["item"] != df["item"].shift(1)).cumsum()

But the condition to reset the counter to 1 when there is a change in col1 is something I need.


Solution

  • Check with groupby and factorize within transform

    df['new'] = df.groupby('col1').item.transform(lambda x : x.factorize()[0]+1)
    df
        item  col1  new
    0    cat     1    1
    1    cat     1    1
    2    dog     1    2
    3   fish     1    3
    4   fish     1    3
    5   fish     2    1
    6  snake     2    2
    7  snake     2    2
    8  snake     2    2