I'm reading the Book: Spark: "The Definitive Guide: Big Data Processing Made Simple"
which came out in 2018, and now is 2023, so the book mentioned that using UDFs written on Python aren't efficient, same with using Python code on RDD's, is still that true?
Old knowledge but still applicable as per below:
UDF's are slower in general as Catalyst cannot optimize them - they are blackboxes. See https://www.codemotion.com/magazine/ai-ml/big-data/light-up-the-spark-in-catalyst-by-avoiding-udf/
For python vs. Scala UDF's, yes python is slower. Reading this https://medium.com/quantumblack/spark-udf-deep-insights-in-performance-f0a95a4d8c62 gives a good insight on things. I use Scala when I assist, so this is not my key area but, summarizing an excellent post:
On RDDs, they cannot be optimized by Catalyst for pyspark or Scala. Should be using DataFrames, DataSets.