I have a specific problem ( not sure if its very challenging to the pros and experts here, but it seems pretty formidable to me) to fix by updating a column based on some conditions in a column headers and values in a column:
I am providing some specific rows of the input Dataframe as an example:
df-in
A B ind M1P M2P M3P M4P M5P
x a 2 0 0 3 5 9
y b 2 Nan Nan Nan 7 11
z c 2 0 Nan 0 3 3
w d 2 0 0 0 Nan 8
u q 2 0 0 0 Nan 0
So now, based on the value of the column 'ind' I need to check the column Mx ( where x can be 1,2,3,4,5). In the above example since all values in ind column are 2, I need to check M2P column and above ( I do not care about M1 column, However if ind was 1 I had to check M1 column). Now in this example if M2P column is 0, nan or blank, it gets the value from M3P, if M3P is also blank, 0, or null, it takes value from M4P. If M4P is also blank, null or 0, it gets the value from M5P, however if M5P value is blank/0/nan, then the value in M2P remains as it is ( the same logic needs to be created if ind is 1,2,3, or 5, that is if ind is 5, then it does not look anywhere else)
So the output of the above should be:
df-out
A B ind M1P M2P M3P M4P M5P
x a 2 0 3 3 5 9
y b 2 Nan 7 Nan 7 11
z c 2 0 3 0 3 3
w d 2 0 8 0 Nan 8
u q 2 0 0 0 Nan 0
I am still struggling to figure what should be the best way to attack this problem in pandas. Not able to understand yet. Any help/codes and ideas will be immensely appreciated.
Use:
#get DataFrame with M + number + P pattern
df1 = df.filter(regex='M\d+P')
#extract numbers and convert to integers
cols = df1.columns.str.extract('(\d+)', expand=False).astype(int)
#compare by ind column
m = cols.to_numpy() == df['ind'].to_numpy()[:, None]
#update back filled missing values - only masked rows
df.update(df1.replace(['Nan', 0, '0'], np.nan).bfill(axis=1).where(m))
print (df)
A B ind M1P M2P M3P M4P M5P
0 x a 2 0 3 3 5 9
1 y b 2 Nan 7 Nan 7 11
2 z c 2 0 3 0 3 3
3 w d 2 0 8.0 0 Nan 8
4 u q 2 0 0 0 Nan 0