This is something I always struggle with and is very beginner. Essentially, I want to locate and apply changes to a column based on a filter from another column.
Example input.
import pandas as pd
cols = ['col1', 'col2']
data = [
[1, 1],
[1, 1],
[2, 1],
[1, 1],
]
df = pd.DataFrame(data=data, columns=cols)
# NOTE: In practice, I will be applying a more complex function
df['col2'] = df.loc[df['col1'] == 1, 'col2'].apply(lambda x: x+1)
Returned output:
col1 col2
0 1 2.0
1 1 2.0
2 2 NaN
3 1 2.0
Expected output:
col1 col2
0 1 2
1 1 2
2 2 2
3 1 2
What's happening:
Records that do not meet the filtering condition are being set to null because of my apply / lambda routine
What I request:
The correct locate/filter and apply approach. I can achieve the expected frame using update, however I want to use locate and apply.
By doing df['col2'] = ...
, you're setting all the values of col2
. But, since you're only calling apply
on some of the values, the values that aren't included get set to NaN. To fix that, save your mask and reuse it:
mask = df['col1'] == 1
df.loc[mask, 'col2'] = df.loc[mask, 'col2'].apply(lambda x: x+1)
Output:
>>> df
col1 col2
0 1 2
1 1 2
2 2 1
3 1 2