Search code examples
pythonpysparkpipjarlog4j

How upgrade a jar file dependency within the Python 3.9 PySpark package?


How can I upgrade a jar file within a python package? I need to upgrade to the latest version of log4j within pyspark.

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/jars/log4j-1.2.17.jar

I tried upgrading PySpark with the below since I have multiple Python versions installed. I am a novice with managing Python installations. Any pointers? Thanks.

python3.9 -m pip install pyspark --upgrade

(On MacOS)


Solution

  • Currently, the latest version of pyspark (3.2.1 from 26th January 2022) ships with log4j-1.2.17.jar, i.e. it is directly bundled in the tar.gz that pip downloads and then extracts and installs. As such, it cannot be upgraded individually, i.e. there is no automated way.

    You might be able to simply replace the .jar file manually, but I would suspect that you might run into issues when the API of the newer version is different than the 1.2.17 one. I would suspect that if a newer version was compatible, the devs of the package would probably have used it