Search code examples
apache-sparkspark-structured-streaming

Spark Structured Streaming Kinesis Data source


Is it possible to use Kinesis streams as a data source for Spark structured streaming? I can't find any connector available.


Solution

  • Qubole have a kinesis-sql library for exactly this.

    https://github.com/qubole/kinesis-sql

    Then you can use the source similar to any other Spark Structured Streaming source:

    val source = spark
       .readStream
       .format("kinesis")
       .option("streamName", "spark-source-stream")
       .option("endpointUrl", "https://kinesis.us-east-1.amazonaws.com")
       .option("awsAccessKeyId", [YOUR_AWS_ACCESS_KEY_ID])
       .option("awsSecretKey", [YOUR_AWS_SECRET_KEY])
       .option("startingPosition", "TRIM_HORIZON")
       .load