I have:
df = pd.DataFrame({'col1': {0: 'success', 1: 'failed', 2: 'variable x 10', 3: 'variable xr 134', 4: 'error', 5: 'not found'}})
I want to create 3 columns depending on if a substring is present in the main column.
variable
, then I want to split the string and take the 2nd elementvariable
, then I want to split the string and take the 3rd element. this value can also be a stringsuccess
, fail
, error
, or not found
, then I want to place a key word of Success
, Fail
, or Error
.None
or ''
to appearI tried:
df['var']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' '),'')
df['other']=np.where(df['col1'].str.contains('not found'|'error'),'Error',
np.where(df['col1'].str.contains('success'),'Success',
np.where(df['col1'].str.contains('fail'),'Fail','')))
but I get the error unsupported operand type(s) for |: 'str' and 'str'
with the str.contains
and cannot select specific portions of the split string, as I receive the error operands could not be broadcast together with shapes (6,) (3,) ()
when trying df['val']=np.where(df['col1'].str.contains('variable'), df['col1'].str.split(' ')[2],'')
Pictorially, this is what I would like to end up with
Any suggestions?
I would extract
each of the three portions (if any), then concat
the whole thing :
others = {
"success": "Success", "fail": "Fail",
"error": "Error", "not found": "Error"
}
pat_vax = r"variable\s+(?P<var>\S+)\s+(?P<val>\S+)"
pat_oth = r"(%s)" % "|".join(others)
out = (
pd.concat([
df, df["col1"].str.extract(pat_vax),
df["col1"].str.extract(pat_oth, expand=False)
.map(others).rename("other")], axis=1
)
)
Output :
print(out)
col1 var val other
0 success NaN NaN Success
1 failed NaN NaN Fail
2 variable x 10 x 10 NaN
3 variable xr 134r xr 134r NaN
4 error NaN NaN Error
5 not found NaN NaN Error
[6 rows x 4 columns]