Search code examples

Pyspark + Redis Remote Server

I have a server with redis and maven configured I then do the following sparkSession

spark = pyspark
.config("", "XX.XXX.XXX.XXX")
.config("spark.redis.port", "6379")
.config("spark.redis.auth", "XXXX")

I am trying to connect to a remote redis server and write/load data from it, however when I try to .save() with the following command

.option("table", "df")
.option("key.column", "case_id")

I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling : java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.redis. Please find packages at

Is there any fix to this?


  • As addition to @fe2s answer , instead of loading it from disk or network storage it can be also loaded directly from maven

    bin/pyspark --packages com.redislabs:spark-redis:2.4.0

    the --packages and --jars arguments can also be used with normal spark-submit command