Given a koalas Dataframe:
df = ks.DataFrame({"high_risk": [0, 1, 0, 1, 1],
"medium_risk": [1, 0, 0, 0, 0]
})
Running a lambda function to get a new column based on the existing column values:
df = df.assign(risk=lambda x: "High" if x.high_risk else ("Medium" if x.medium_risk else "Low"))
df
Out[72]:
high_risk medium_risk risk
0 0 1 High
4 1 0 High
1 1 0 High
2 0 0 High
3 1 0 High
Expected return:
high_risk medium_risk risk
0 0 1 Medium
4 1 0 High
1 1 0 High
2 0 0 Low
3 1 0 High
Why does this assign "High" to each of the values. The intent is to operations on each row, is it looking at the whole column in the comparison?
Using assign
on a koalas df seems not easy to me, but for your case, I would mul
the column 'high_risk' by 2 then add
the column 'medium_risk' and finally map
the result to replace the 2 by 'high' (because you multiply the column by 2 before) 1 by 'medium' and 0 by 'low' such as:
df = df.assign(risk= df.high_risk.mul(2).add(df.medium_risk)
.map({0:'low', 1:'medium', 2:'high'}))
df
high_risk medium_risk risk
0 0 1 medium
1 1 0 high
2 0 0 low
3 1 0 high
4 1 0 high
Note : this would fail if you have 1 in both high and medium risks column.