I have A dataframe which consists of many 'ratio' columns (67 column: ratio_1 till ratio_67). Based on these columns I want to make a new column in my dataframe with an entry 'pass' or 'fail'
I get stuck on two things, to explain my df I made a simple example:
import random
col3=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col4=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
listt = [np.nan, np.nan , 1, 1, 1, 5, 5, 1, 1,np.nan ,np.nan ]
col5 = [listt[random.randrange(len(listt))]
for item in range(14)]
col6 = [listt[random.randrange(len(listt))]
for item in range(14)]
col7 = [listt[random.randrange(len(listt))]
for item in range(14)]
col8 = [listt[random.randrange(len(listt))]
for item in range(14)]
df = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col3' : col3, 'col4' : col4, 'Ratio1' : col5, 'Ratio2' : col6, 'Ratio2' : col6, 'Ratio3' : col7, 'Ratio4' : col8}
df = pd.DataFrame(data=df)
df = df.groupby(['Unit', 'Year']).sum()
df
In this df i want to make a selection for the rows that have only 0 or 1 values in the ratio columns, should get a pass as comment. And rows that contain in one of the ratio columns some other entry (whatever it is) should get a fail comment.
I try it like this:
[((df_bk_t['Ratio1'] == float(0)) | (df_bk_t['Ratio1'] == float(1)))
& ((df_bk_t['Ratio2'] == float(0)) | (df_bk_t['Ratio2'] == float(1)))
& ((df_bk_t['Ratio3'] == float(0)) | (df_bk_t['Ratio3'] == float(1)))
& ((df_bk_t['Ratio4'] == float(0)) | (df_bk_t['Ratio4'] == float(1)))]
But this method seems very inefficient when i would have to do it for 67 columns. Does anyone know how this can be done more efficient? And how can i make a new column with pass or fail, based on this? Is this method correct for this?
Thank you in advance for help/suggestions,
Jen
Use DataFrame.filter
for columns with column name with Ratio
, create boolean DataFrame by DataFrame.isin
and test if all Trues per row by DataFrame.all
, last set new columns values by numpy.where
:
df['new'] = np.where(df.filter(like='Ratio').isin([0,1]).all(axis=1), 'Fail', 'Pass')
If performance is important use numpy solution:
arr = df.filter(like='Ratio').values
df['new'] = np.where(np.all(np.isin(arr, [0,1]), axis=1), 'Fail', 'Pass')
print (df)
Unit Year
1 2014 0 0 1.0 1.0 0.0 5.0 Pass
2015 0 0 1.0 0.0 1.0 1.0 Fail
2016 0 0 1.0 5.0 1.0 0.0 Pass
2017 0 0 0.0 1.0 1.0 5.0 Pass
2 2015 2 4 5.0 0.0 0.0 1.0 Pass
2016 4 6 1.0 1.0 1.0 5.0 Pass
2017 6 8 0.0 1.0 5.0 5.0 Pass
3 2017 0 0 0.0 0.0 0.0 5.0 Pass
4 2014 0 0 0.0 1.0 0.0 1.0 Fail
5 2015 0 0 1.0 1.0 5.0 1.0 Pass
6 2014 100 200 0.0 5.0 0.0 0.0 Pass
2015 200 900 0.0 1.0 1.0 1.0 Fail
2016 300 400 5.0 1.0 1.0 0.0 Pass
2017 400 500 0.0 0.0 1.0 0.0 Fail