Search code examples
azurepysparkdatabricks

How to register a complex function as the below as UDF in PYSPARK?


I have a function that takes a dataframe as a parameter and calculates the NULL value counts and NULL value percentage and returns a dataframe with column_name, null value count and null percentages. How can I register it as UDF in pyspark so that I can use the spark's processing advantage. I am using spark 3.3.0

This is my function:

enter image description here

I could'nt find any method or implementation for complex functions and functions that run on whole dataframe.


Solution

  • The function you made is not UDF. You are using spark methods inside the function so it is not UDF. Spark methods are already optimized for datasets.