Search code examples
pyspark

Passing arguments to pyspark udf


I want to pass two argument (let say x and y) to a pyspark udf.

#I want to pass x and y as argument
@udf (returnType=StringType())
def my_udf(str,x,y):
    return some_result
#Now call the udf on pyspark dataframe (df)
#I don't know how we can pass two arguemnt x and y here while calling udf
df.withColumn('new_col_name',my_udf(df.col,x,y))


Solution

  • To pass the variable to pyspak UDF ,you can use lit functiond from pyspark.sql.functions module.This allows us to pass constant values as arguments to UDF.

    from pyspark.sql.functions import lit
    
    @udf (returnType=StringType())
    def my_udf(str,x,y):
        return some_result
    #Now call the udf on pyspark dataframe (df)
    #I don't know how we can pass two arguemnt x and y here while calling udf
    df.withColumn('new_col_name',my_udf(df.col,lit(x),lit(y)))
    

    Hope this helps.