Search code examples
scalaspark-streamingoffset

Spark Streaming direct approach without Check point location


When we use Spark Streaming Direct approach and without specifying the check point location, where the offsets will be stored and how?

Is there really any difference between using check point location and without specifying any check point location?

Is there going to be any data loss, if i am not specifying the check point location?


Solution

  • If you don't checkpoint, you won't be able to recover in case your driver crashes. In addition, Kafka offsets won't be checkpointed since there is no checkpoint, you'll need to manually store them yourself.

    Is there really any difference between using check point location and without specifying any check point location?

    That sentence doesn't make much sense. If you don't provide a checkpoint directory, there'll be not checkpoint, if you do there will. To reach exactly once semantics (if required) you'll need to store offsets manually.