Search code examples
apache-sparkpysparkdistributed-computingdatabricks

Does the User Defined Functions (UDF) in SPARK works in a distributed way?


Does the User Defined Functions (UDF) in SPARK works in a distributed way if data is stored in different nodes or it accumulates all data into the master node for processing purpose? If it works in a distributed way then can we convert any function in python whether it's pre-defined or user-defined into spark UDF like mentioned below :

spark.udf.register("myFunctionName", functionNewName)


Solution

  • Spark dataframe is distributed across the cluster in partitions. Each partition is processed by the UDF, so the answer is yes. You can also see this in Spark UI.