I have unique records in a dataframe, with no duplicates, as determined by combination of values across several columns:
import pandas as pd
d = {'Alpha' : ['C', 'B', 'C','D', 'A', 'A'], 'Beta' : ['G', 'F', 'G', 'H', 'A', 'A'],'Year': ['Base', 88, 94, 22, 'Base', 66], 'Zulu' : [1, 2, -3, 4, 5, -3]}
df = pd.DataFrame(d)
This gets us:
Alpha Beta Year Zulu
0 C G Base 1
1 B F 88 2
2 C G 94 -3
3 D H 22 4
4 A A Base 5
5 A A 66 -4
Index 0 is a near match with index 2, & index 4 is a near match with index 5, except that the 'Year' value for index 0 & index 4 is Base, and 'Zulu' value for index 2 & 5 are negative. I want to replace the negative 'Zulu' values in 2 & 5 with their corresponding Zulu values in index 0 & 4, without disrupting other rows. The output df would look like this:
Alpha Beta Year Zulu
0 C G Base 1
1 B F 88 2
2 C G 94 1
3 D H 22 4
4 A A Base 5
5 A A 66 5
I can easily start with a filter & deep copy to avoid slicing issues, & then a backup of original data:
df_sub = df[df.Zulu < 0].copy(deep=True)
df_sub['Zulu_backup'] = df_sub.Zulu
I can then use df_sub to merge with the original:
df_result = pd.merge(df_sub, df, how='right')
Resulting in:
Alpha Beta Year Zulu Zulu_backup
0 C G 94 -3 -3.0
1 A A 66 -3 -3.0
2 C G Base 1 NaN
3 B F 88 2 NaN
4 D H 22 4 NaN
5 A A Base 5 NaN
But I don't know where to go from here without messing up the dataframe's other rows. I have a rather convoluted script. I am guessing I could create a series and map it to the dataframe, but I'm at a loss as to how to do so. Iterating through this is not an option as I have millions of rows. Would appreciate any assistance.
One way would be to convert the negative values in Zulu to nan and then fillna
df.loc[df['Zulu'] < 0, 'Zulu'] = np.nan
df['Zulu'] = df.groupby(['Alpha', 'Beta']).Zulu.apply(lambda x: x.ffill().bfill()).astype(int)
Alpha Beta Year Zulu
0 C G Base 1
1 B F 88 2
2 C G 94 1
3 D H 22 4
4 A A Base 5
5 A A 66 5