amazon-web-services scala apache-spark bigdata aws-glue

Will AWS Glue Spark Job Bookmark reprocess failed jobs?

I am new to AWS Glue and I would like to understand how Spark Job behaves. I have a Spark Job that fails due to high S3 PUTS. Some of the files are processed (to be clear successful processed files have been written to the sink bucket), while others are not, note that 'job commit' is not reached yet. If job bookmark is enabled, will it still reprocess those files that have been written to the sink or will it just perform an incremental update from the time where it has failed?

Solution

The documentation on job bookmarks is really helpful in my opinion. They even include an example for your use case.

Long story short:

If a job run fails before the job.commit(), the files are processed in a subsequent run.