I want to debug Spark code in PyCharm because it is easier to debug. But I need to add a spark-redis.jar
otherwise Failed to find data source: redis
The code to connect to redis is
spark = SparkSession \
.builder \
.appName("Streaming Image Consumer") \
.config("spark.redis.host", self.redis_host) \
.config("spark.redis.port", self.redis_port) \
.getOrCreate()
How to do fix it if using PyCharm?
I have tried adding spark.driver.extraClassPath
in $SPARK_HOME/conf/spark-defaults.conf
but it does not work.
I also tried adding environment variable PYSPARK_SUBMIT_ARGS --jars ...
in run configuration but it raise other error
Adding spark.driver.extraClassPath
to spark-defaults.conf
works for me with Spark 2.3.1
cat /Users/oleksiidiagiliev/Soft/spark-2.3.1-bin-hadoop2.7/conf/spark-defaults.conf
spark.driver.extraClassPath /Users/oleksiidiagiliev/.m2/repository/com/redislabs/spark-redis/2.3.1-SNAPSHOT/spark-redis-2.3.1-SNAPSHOT-jar-with-dependencies.jar
Please note, this is a jar with dependencies (you can build one from sources using mvn clean install -DskipTests
).
Aslo I added pyspark libraries and SPARK_HOME
environment variable to PyCharm project as described here https://medium.com/parrot-prediction/integrating-apache-spark-2-0-with-pycharm-ce-522a6784886f