Search code examples
dataframescalaapache-spark

Spark Scala Dataframe case when like function


I am using spark scala with DataFrame API, trying to convert the below sql logic

   CASE 
     WHEN col_1 like '%XYZ' OR col_1 like '%ZYX' THEN 
       CASE WHEN col_2 like '%TTT' THEN 'ABC' ELSE 'BBA' END
     WHEN col_1 not like '%XYZ' OR col_1 not like '%ZYX' 
       CASE WHEN col_2 like '%YYY' THEN BBC' END
   END as new_col 

How to construct CASE WHEN with multiple like and not like conditions with spark scala dataframe api?


Solution

  • Use the expr function and pass the whole case statement in it as below.

    import org.apache.spark.sql.functions._
    val df=Seq(
          ("A","01/01/2022",1), ("AXYZ","02/01/2022",1), ("AZYX","03/01/2022",1),("AXYZ","04/01/2022",0), ("AZYX","05/01/2022",0),("AB","06/01/2022",1), ("A","07/01/2022",0) ).toDF("Category", "date", "Indictor")
    
    
    df.select(col("*"),expr("""CASE WHEN Category like '%XYZ' OR Category like '%ZYX' THEN 
                                       CASE WHEN Indictor = 1 THEN 'ABC' ELSE 'BBA' END
                                    WHEN Category not like '%XYZ' OR Category not like '%ZYX' then
                                        CASE WHEN Indictor = 1 THEN 'BBC' ELSE 'BBD' END
                                    END""").alias("new_col")).show()
    
    +--------+----------+--------+-------+
    |Category|      date|Indictor|new_col|
    +--------+----------+--------+-------+
    |       A|01/01/2022|       1|    BBC|
    |    AXYZ|02/01/2022|       1|    ABC|
    |    AZYX|03/01/2022|       1|    ABC|
    |    AXYZ|04/01/2022|       0|    BBA|
    |    AZYX|05/01/2022|       0|    BBA|
    |      AB|06/01/2022|       1|    BBC|
    |       A|07/01/2022|       0|    BBD|
    +--------+----------+--------+-------+