Search code examples
pandaspandas-apply

Pandas locate and apply changes to column


This is something I always struggle with and is very beginner. Essentially, I want to locate and apply changes to a column based on a filter from another column.

Example input.

import pandas as pd
cols = ['col1', 'col2']
data = [
        [1, 1],
        [1, 1],
        [2, 1],
        [1, 1],
]
df = pd.DataFrame(data=data, columns=cols)
# NOTE: In practice, I will be applying a more complex function
df['col2'] = df.loc[df['col1'] == 1, 'col2'].apply(lambda x: x+1)

Returned output:

   col1  col2
0     1   2.0
1     1   2.0
2     2   NaN
3     1   2.0

Expected output:

   col1  col2
0     1     2
1     1     2
2     2     2
3     1     2

What's happening:

Records that do not meet the filtering condition are being set to null because of my apply / lambda routine

What I request:

The correct locate/filter and apply approach. I can achieve the expected frame using update, however I want to use locate and apply.


Solution

  • By doing df['col2'] = ..., you're setting all the values of col2. But, since you're only calling apply on some of the values, the values that aren't included get set to NaN. To fix that, save your mask and reuse it:

    mask = df['col1'] == 1
    df.loc[mask, 'col2'] = df.loc[mask, 'col2'].apply(lambda x: x+1)
    

    Output:

    >>> df
       col1  col2
    0     1     2
    1     1     2
    2     2     1
    3     1     2