Search code examples
amazon-web-servicesaws-glueaws-glue-spark

How to stop / exit a AWS Glue Job (PySpark)?


I have a successfully running AWS Glue Job that transform data for predictions. I would like to stop processing and output status message (which is working) if I reach a specific condition:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    return None

This produces "SyntaxError: 'return' outside function", I tried:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    job.commit()

This is not running in AWS Lambda, it is Glue Job that gets started using Lambda (e.g., start_job_run()).


Solution

  • Since @amsh's solution did not worked for me, I continued to look for a solution and discovered that:

    os._exit() terminates immediately at the C level and does not perform any of the normal tear-downs of the interpreter.

    Thanks to @Glyph's answer! You can then proceed this way:

    if specific_condition is None:
        s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
        job.commit()
        os._exit()
    

    Your job will succeed and not terminates with a "SystemExit: 0" error.