I have written a function that takes condition from parameters file and based on condition add the column value; but i am constantly getting error TypeError: condition should be a Column
condition = "type_txt = 'clinic'"
input_df = input_df.withColumn(
"prm_data_category",
F.when(condition, F.lit("clinic")) # this doesn't work
.when(F.col("type_txt") == 'office', F.lit("office")) # this works
.otherwise(F.lit("other")),
)
Is there any way to use condition as sql condition so it is easy to pass via parameter instead of col?
You can use sql expr
using F.expr
from pyspark.sql import functions as F
condition = "type_txt = 'clinic'"
input_df1 = input_df.withColumn(
"prm_data_category",
F.when(F.expr(condition), F.lit("clinic"))
.when(F.col("type_txt") == 'office', F.lit("office"))
.otherwise(F.lit("other")),
)