Databricks Koalas Column Assignment Based on Another COlumn Value Lambda Function

Given a koalas Dataframe:

df = ks.DataFrame({"high_risk": [0, 1, 0, 1, 1], 
                   "medium_risk": [1, 0, 0, 0, 0]
                   })

Running a lambda function to get a new column based on the existing column values:

df = df.assign(risk=lambda x: "High" if x.high_risk else ("Medium" if x.medium_risk else "Low"))
df
Out[72]: 
   high_risk  medium_risk  risk
0          0            1  High
4          1            0  High
1          1            0  High
2          0            0  High
3          1            0  High

Expected return:

       high_risk  medium_risk  risk
    0          0            1  Medium
    4          1            0  High
    1          1            0  High
    2          0            0  Low
    3          1            0  High

Why does this assign "High" to each of the values. The intent is to operations on each row, is it looking at the whole column in the comparison?

Solution

Using assign on a koalas df seems not easy to me, but for your case, I would mul the column 'high_risk' by 2 then add the column 'medium_risk' and finally map the result to replace the 2 by 'high' (because you multiply the column by 2 before) 1 by 'medium' and 0 by 'low' such as:

df = df.assign(risk= df.high_risk.mul(2).add(df.medium_risk)
                       .map({0:'low', 1:'medium', 2:'high'}))
df
   high_risk  medium_risk    risk
0          0            1  medium
1          1            0    high
2          0            0     low
3          1            0    high
4          1            0    high

Note : this would fail if you have 1 in both high and medium risks column.