I have a dataframe which I want to group based on the name. Once grouped, I want to go through each row of each group and update the values of a column to then do other operations.
The problem is that when I update a row, the value of the row is updated in the dataframe, but the row object is still not updated.
For example, in this case the value of df_group.Age outputs 25 which is the updated value but the value of row.Age outputs the value 20 which is the value not updated. How can I make the row.Age value update in that same iteration so that I can continue using the updated row.Age value?
import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'A', 'B', 'D'],
'Age': [20, 21, 19, 18, 21, 19, 18],
'Size': [7, 7, 9, 8, 7, 9, 8]}
df = pd.DataFrame(data).sort_values(by='Name').reset_index(drop=True)
df['New_age'] = 0
df_grouped = df.groupby(['Name'])
for group_name, df_group in df_grouped:
for row in df_group.itertuples():
if row.Age == 20:
df_group.at[row.Index, 'Age'] = 25
print(df_group.Age)
print(row.Age)
#Do things with the row.Age value = 25
row.Age
value is not updated in the itertuples
loop is because the row object is a named tuple and it is immutable.
To achieve what you want is to use the df.loc
accessor to update the value in the DataFrame
and then retrieve the updated value from the DataFrame
:
for group_name, df_group in df_grouped:
for row in df_group.itertuples():
if row.Age == 20:
df.loc[row.Index, 'Age'] = 25
row = row._replace(Age=25) # update the named tuple
print(df_group.Age)
print(row.Age)