Search code examples
spring-bootapache-sparkspark-streaming

spark streaming spring boot application


I have created a spring boot application, which is reading data from Kafka using spark streaming API and indexing to ElasticSearch.

I have a few queries How to deploy this jar on the spark master cluster. Where can I see my application log? And what are the recommended ways to achieve the same? What if the master goes down? (seems like a single point of failure)

Any leads would be greatly appreciated.


Solution

  • I don't think you need the spring boot app just to read from kafka and put on to ES via spark streaming, apache spark is sufficient to do that, however you might have your own reasons.

    Regarding your question:

    • deploy: you can make an uber jar with all the classpath dependencies and submit it to the spark cluster. The cluster is taking care to distributed it across the nodes and make if fault tollerant
    • application log: typically spark can be instructed with a log4j file where you can control the logging capabilities of your app. If your running on YARN you can also use yarn logs and grep utility