I am querying a dataframe like below:
>>> df
A,B,C
1,1,200
1,1,433
1,1,67
1,1,23
1,2,330
1,2,356
1,2,56
1,3,30
if I do part_df = df[df['A'] == 1 & df['B'] == 2], I am able to get a sub-dataframe as
>>> part_df
A, B, C
1, 2, 330
1, 2, 356
1, 2, 56
Now i wanna make some changes to part_df like:
part_df['C'] = 0
The changes are not reflected in the original df at all. I guess it is because of numpy's array mechanism that everytime a new copy of dataframe is produced. I am wondering how do I query a dataframe with some conditions and makes changes to the selected part as the example I provided and reflect value back to original dataframe in place?
You should do this instead:
In [28]:
df.loc[(df['A'] == 1) & (df['B'] == 2),'C']=0
df
Out[28]:
A B C
0 1 1 200
1 1 1 433
2 1 1 67
3 1 1 23
4 1 2 0
5 1 2 0
6 1 2 0
7 1 3 30
[8 rows x 3 columns]
You should use loc
and select the column of interest 'C' in the square brackets at the end