Search code examples
pysparkcase-when

pyspark: TypeError: condition should be a Column with with otherwise


I have written a function that takes condition from parameters file and based on condition add the column value; but i am constantly getting error TypeError: condition should be a Column

condition = "type_txt = 'clinic'"
input_df = input_df.withColumn(
        "prm_data_category",
        F.when(condition, F.lit("clinic")) # this doesn't work 
        .when(F.col("type_txt") == 'office', F.lit("office")) # this works
        .otherwise(F.lit("other")),
    )

Is there any way to use condition as sql condition so it is easy to pass via parameter instead of col?


Solution

  • You can use sql expr using F.expr

    from pyspark.sql import functions as F
        condition = "type_txt = 'clinic'"
        input_df1 = input_df.withColumn(
                "prm_data_category",
                F.when(F.expr(condition), F.lit("clinic")) 
                .when(F.col("type_txt") == 'office', F.lit("office"))
                .otherwise(F.lit("other")),
            )