Search code examples
pythonpandasreplaceconditional-statementsrecords

Pandas: replace records based on conditional test in column


I have unique records in a dataframe, with no duplicates, as determined by combination of values across several columns:

import pandas as pd
d = {'Alpha' : ['C', 'B', 'C','D', 'A', 'A'], 'Beta' : ['G', 'F', 'G', 'H', 'A', 'A'],'Year': ['Base', 88, 94, 22, 'Base', 66], 'Zulu' : [1, 2, -3, 4, 5, -3]}
df = pd.DataFrame(d)   

This gets us:

  Alpha Beta  Year  Zulu
0     C    G  Base     1
1     B    F    88     2
2     C    G    94    -3
3     D    H    22     4
4     A    A  Base     5
5     A    A    66    -4

Index 0 is a near match with index 2, & index 4 is a near match with index 5, except that the 'Year' value for index 0 & index 4 is Base, and 'Zulu' value for index 2 & 5 are negative. I want to replace the negative 'Zulu' values in 2 & 5 with their corresponding Zulu values in index 0 & 4, without disrupting other rows. The output df would look like this:

  Alpha Beta  Year  Zulu
0     C    G  Base     1
1     B    F    88     2
2     C    G    94     1
3     D    H    22     4
4     A    A  Base     5
5     A    A    66     5

I can easily start with a filter & deep copy to avoid slicing issues, & then a backup of original data:

df_sub = df[df.Zulu < 0].copy(deep=True)
df_sub['Zulu_backup'] = df_sub.Zulu

I can then use df_sub to merge with the original:

df_result = pd.merge(df_sub, df, how='right')

Resulting in:

Alpha Beta  Year    Zulu     Zulu_backup
0     C    G    94    -3         -3.0
1     A    A    66    -3         -3.0
2     C    G  Base     1          NaN
3     B    F    88     2          NaN
4     D    H    22     4          NaN
5     A    A  Base     5          NaN

But I don't know where to go from here without messing up the dataframe's other rows. I have a rather convoluted script. I am guessing I could create a series and map it to the dataframe, but I'm at a loss as to how to do so. Iterating through this is not an option as I have millions of rows. Would appreciate any assistance.


Solution

  • One way would be to convert the negative values in Zulu to nan and then fillna

    df.loc[df['Zulu'] < 0, 'Zulu'] = np.nan
    df['Zulu'] = df.groupby(['Alpha', 'Beta']).Zulu.apply(lambda x: x.ffill().bfill()).astype(int)
    
    
        Alpha   Beta    Year    Zulu
    0   C       G       Base    1
    1   B       F       88      2
    2   C       G       94      1
    3   D       H       22      4
    4   A       A       Base    5
    5   A       A       66      5