I have a data frame that looks as shown below and there is no relationship between the two columns
item | col1 |
---|---|
cat | 1 |
cat | 1 |
dog | 1 |
fish | 1 |
fish | 1 |
fish | 2 |
snake | 2 |
snake | 2 |
snake | 2 |
I want to generate a new column using these two. The resulting column should do a cumulative sum for the items column (increments number only when the item changes) and also the number must start from 1 again if the series in col1 has a change.
item | col1 | result |
---|---|---|
cat | 1 | 1 |
cat | 1 | 1 |
dog | 1 | 2 |
fish | 1 | 3 |
fish | 1 | 3 |
fish | 2 | 1 |
snake | 2 | 2 |
snake | 2 | 2 |
snake | 2 | 2 |
I was able to do the cumulative sum with the below code,
df["result"] = (df["item"] != df["item"].shift(1)).cumsum()
But the condition to reset the counter to 1 when there is a change in col1 is something I need.
Check with groupby
and factorize
within transform
df['new'] = df.groupby('col1').item.transform(lambda x : x.factorize()[0]+1)
df
item col1 new
0 cat 1 1
1 cat 1 1
2 dog 1 2
3 fish 1 3
4 fish 1 3
5 fish 2 1
6 snake 2 2
7 snake 2 2
8 snake 2 2