Search code examples
pysparkoffsetspark-structured-streamingcheckpoint

Pyspark structured streaming, how to stop or restart job if it fails?


Pyspark structured streaming, how to stop or restart job if it fails? I am new to this so want to understand 1.How checkpointing helps while restart 2.Do we need to call any method to passing offset when we restart 3.Also,I want to know how to stop job, if have any changes in code logic so as to update


Solution

    1. Yes! That is the point of using the checkpoint, to failure recovery https://www.databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
    2. If you are using Yarn (also EMR that use yarn), you can set the property spark.yarn.maxAppAttempts https://spark.apache.org/docs/latest/running-on-yarn.html but thats it.
    3. One way is using yarn kill command https://sarathkumarsivan.medium.com/how-to-kill-all-applications-running-on-yarn-2052289a7e7, there are multiple ways to implement this, all depends in where is the Job running.