I have a function that takes a dataframe as a parameter and calculates the NULL value counts and NULL value percentage and returns a dataframe with column_name, null value count and null percentages. How can I register it as UDF in pyspark so that I can use the spark's processing advantage. I am using spark 3.3.0
This is my function:
I could'nt find any method or implementation for complex functions and functions that run on whole dataframe.
The function you made is not UDF. You are using spark methods inside the function so it is not UDF. Spark methods are already optimized for datasets.