I have a properly sync'ed pyspark
client / spark
installation: both versions are 3.3.1 [ shown below]. The full exception message is:
py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist
This has been identified in another SOF post as most likely due to versioning mismatch between the pyspark
invoker/caller and the spark
backend. I agree that would seem the likely cause: but then I have verified carefully that both sides of the equation are equal:
pyspark
and spark
are same versions:
Python 3.10.13 (main, Aug 24 2023, 22:48:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
In [1]: import pyspark
In [2]: print(f"PySpark version: {pyspark.__version__}")
PySpark version: 3.3.1
Spark
was installed by downloading the version 3.3.1 .tgz directly from the apache
site and unzip/tar-ring. The SPARK_HOME
was pointed to that directory and the $SPARK_HOME/bin
added to the path.
$spark-shell --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Inside the python script the version has been verified as well:
pyspark version: 3.3.1
But the script blows up with a pyspark / spark error
An error occurred while calling None.org.apache.spark.api.python.PythonFunction
py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:180)
So .. what else might be going on here? Is there some way I'm not seeing in which the versions of spark/pyspark might be out of sync?
pycharm
situation. Looks like I had not restarted it after twiddling between versions of spark
. It remembered an earlier version of the default (for homebrew
) of 3.5.0