data = {'Cat': ['A Phaser','A','B Phaser','B','B','B'],
'L1': ['Phase','xyzss','xyzss','Phase','xyzss','xyzss'],
'L2': ['xyzss','Stage','xyzss','xyzss','Phase2','xyzss'],
'L3': ['xyzss','xyzss','xyzss','xyzss','xyzss','Step'],
}
df = pd.DataFrame (data, columns = ['Cat','L1','L2','L3'])
def funt(s):
if re.findall(r'Phase', s, re.IGNORECASE):
return 'Phase'
elif re.findall(r'Stag', s, re.IGNORECASE):
return 'Stage'
elif re.findall(r'Step', s, re.IGNORECASE):
return 'Step'
df[['L1','L2','L3']].apply(lambda row: '_'.join(row.values.astype(str)), axis=1).apply(lambda x : funt(x))
Output:
0 Phase
1 Stage
2 None
3 Phase
4 Phase
5 Step
dtype: object
I am wondering if there is another way of approaching this like a way of applying findall
across columns without joining columns together? Thanks in advance!
Filter required rows. Using replace, null the xyzss. Stack and reset index and you have your outcome as a pd. Series.
Option 1: If xyzss
does not vary: df['filter']=df.iloc[:,1:4].replace({'xyzss':np.nan}).stack().reset_index(drop=True)
Option 1: If xyzss
varies:
df.join(pd.Series(df.mask(~df.isin(pat), np.nan).stack().reset_index(level=1, drop=True),name='filter'))