Search code examples
pythonpandasdataframeloopsrow

How to use a for loop on a Pandas dataframe and update column based on when another column value changes


Python beginner here. I'm working with a pandas dataframe and I want to loop over each row of data and change the values of a column based off another column's value. I have about 20 lines of code that do some stuff with the rows of data. I want to apply that code but for it to restart every time a value changes in a column. See example table below:

name      country  section  new_column
bob       US       1
jim       Canada   1    
christina US       2    
jason     UK       3    
kim       US       3    
chris     UK       4    
jimbo     Canada   4    
felicia   Canada   5

I would like to update my table to look as follows with a for loop:

name      country  section  new_column
bob       US       1        1
jim       Canada   1        2   
christina US       2        1   
jason     UK       3        1
kim       US       3        2
chris     UK       4        1
jimbo     Canada   4        2
felicia   Canada   5        1

So, let's say my table looks something like above. Every time column 'section' changes in value (from 1 to 2, 3 etc.), restart loop based on that column's value. I was hoping I could simply insert my lines of code below that structured for loop.

My pseudo code goes as follows:

  1. for when column 'section' changes value, do the following:
  2. my code
  3. close loop?

I hope this makes sense.


Solution

  • Use groupby_cumcount:

    df['new_column'] = df.groupby('section').cumcount().add(1)
    print(df)
    
    # Output
            name country  section  new_column
    0        bob      US        1           1
    1        jim  Canada        1           2
    2  christina      US        2           1
    3      jason      UK        3           1
    4        kim      US        3           2
    5      chris      UK        4           1
    6      jimbo  Canada        4           2
    7    felicia  Canada        5           1