Couple of options I can think of
Not sure which is better. I'm not clear on how to easily translate the redshift schema into something parquet could intake but maybe the spark connector will take care of that for me.
Get the Redshift JDBC jar and use the sparkSession.read.jdbc
with the redshift connection details like this in my example:
val properties = new java.util.Properties()
properties.put("driver", "com.amazon.redshift.jdbc42.Driver")
properties.put("url", "jdbc:redshift://redshift-host:5439/")
properties.put("user", "<username>") properties.put("password",spark.conf.get("spark.jdbc.password", "<default_pass>"))
val d_rs = spark.read.jdbc(properties.get("url").toString, "data_table", properties)
My relevant blog post: http://garrens.com/blog/2017/04/09/connecting-apache-spark-to-external-data-sources/
Spark streaming should be irrelevant in this case.
I would also recommend using databricks spark-redshift package to make the bulk unload from redshift and load into spark much faster.