Python beginner here. I'm working with a pandas dataframe and I want to loop over each row of data and change the values of a column based off another column's value. I have about 20 lines of code that do some stuff with the rows of data. I want to apply that code but for it to restart every time a value changes in a column. See example table below:
name country section new_column
bob US 1
jim Canada 1
christina US 2
jason UK 3
kim US 3
chris UK 4
jimbo Canada 4
felicia Canada 5
I would like to update my table to look as follows with a for loop:
name country section new_column
bob US 1 1
jim Canada 1 2
christina US 2 1
jason UK 3 1
kim US 3 2
chris UK 4 1
jimbo Canada 4 2
felicia Canada 5 1
So, let's say my table looks something like above. Every time column 'section' changes in value (from 1 to 2, 3 etc.), restart loop based on that column's value. I was hoping I could simply insert my lines of code below that structured for loop.
My pseudo code goes as follows:
I hope this makes sense.
Use groupby_cumcount
:
df['new_column'] = df.groupby('section').cumcount().add(1)
print(df)
# Output
name country section new_column
0 bob US 1 1
1 jim Canada 1 2
2 christina US 2 1
3 jason UK 3 1
4 kim US 3 2
5 chris UK 4 1
6 jimbo Canada 4 2
7 felicia Canada 5 1