Search code examples
apache-sparkpysparkazure-databricksdatabricks-sql

How to include multiple expression in a case when statement with databricks pyspark


The following case when pyspark code works fine when adding a single case when expr

%python
from pyspark.sql.functions import expr
df = sql("select * from xxxxxxx.xxxxxxx")
transfromWithCol = (df.withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end")))

However, I would like to add another case when statement to the same withColumn as follows:

%python
from pyspark.sql.functions import expr
df = sql("""select * from retailrpt.vw_fund_managers""")
transfromWithCol = (df
                    .withColumn("MyTestName", expr("case when first_name = 'Peter' then 1 else 0 end"), expr("case when last_name = 'Jones' then 5 else 4 end")))

I get the error:

TypeError: withColumn() takes 3 positional arguments but 4 were given

Is it not possible to add multiple case statements to a withColumn?


Solution

    • To give multiple conditions you can use the the expr in the following way. The following is my dataframe:
    data = [[1,'Ana','Anathan'],[2,'Topson','Topias'],[3,'Ceb','Seb']]
    df = spark.createDataFrame(data=data,schema=['id','gname','aname'])
    df.show()
    

    enter image description here

    • I got the same error when I tried to use similar code on my dataframe:
    from pyspark.sql.functions import expr
    
    df1 = (df.withColumn("MyTestName", expr("case  when gname =  'Ana'  then  1  else  0  end"), expr("case  when aname =  'Seb'  then  5  else  4  end")))
    df1.show()
    

    enter image description here

    • Now, use the following code to get multiple conditions. You will be able to write multiple conditions but not multiple else conditions:
    from pyspark.sql.functions import expr
    
    df1 = df.withColumn("MyTestName", expr("case  when gname =  'Ana'  then  1  when aname='Seb'  then  2  else  0  end"))
    df1.show()
    

    enter image description here