Search code examples
pythonapache-sparkpysparkapache-spark-sqluser-defined-functions

Is Python UDF still inefficient in Spark?


I'm reading the Book: Spark: "The Definitive Guide: Big Data Processing Made Simple" which came out in 2018, and now is 2023, so the book mentioned that using UDFs written on Python aren't efficient, same with using Python code on RDD's, is still that true?


Solution

  • Old knowledge but still applicable as per below: