I have the following dataframe:
import pandas as pd
import re
df = pd.DataFrame({'Column_01': ['Press', 'Temp', '', 'Strain gauge', 'Ultrassonic', ''],
'Column_02': ['five', 'two', 'five', 'five', 'three', 'three']})
I would first like to check that 'Column_01' is filled. If 'Columns_01' is filled OR 'Column_02' contains the words 'one', 'two', 'three'. A new column (Classifier) will receive 'SENSOR'.
To identify the 'Column_02' string I implemented the following code:
df['Classifier'] = df.apply(lambda x: 'SENSOR'
if re.search(r'one|two|three', x['Column_02'])
else 'Nan', axis = 1)
This code is working. It perfectly finds the string on the dataframe line. However, I also needed to check that 'Column_01' is filled. I'm not able to use the function notnull(), to solve the problem.
I would like the output to be:
Column_01 Column_02 Classifier
Press five SENSOR #current line of Column_01 completed
Temp two SENSOR #current line of Column_02 completed; string 'two'
five Nan
Strain gauge five SENSOR #current line of Column_01 completed
Ultrassonic three SENSOR #current line of Column_01 completed; string 'three'
three SENSOR #string 'three'
Generally you should avoid .apply()
(ref https://stackoverflow.com/a/54432584/11610186 ).
This should do the trick:
import numpy as np
df["Classifier"]=np.where(df["Column_01"].fillna('').ne('')|df["Column_02"].str.contains("(one)|(two)|(three)"), "SENSOR", np.nan)
Outputs:
Column_01 Column_02 Classifier
0 Press five SENSOR
1 Temp two SENSOR
2 five nan
3 Strain gauge five SENSOR
4 Ultrassonic three SENSOR
5 three SENSOR