Search code examples
pythonaws-glue

AWS Glue: Get list of objects read by create_dynamic_frame.from_options


I'm using create_dynamic_frame.from_options to read CSV files into a Glue Dynamic Dataframe. My Glue job is using bookmark and from_options has both a transformation ctx configured and recursive search.

dyf = glueContext.create_dynamic_frame.from_options("s3", 
    {
        "paths": [
            "s3://bucket/files/"
        ],
        "recurse" : True
    },
    transformation_ctx = "example"
)

s3://bucket/files contains multiple CSVs. Is there a way to get a list of which objects were actually read? As I'm using bookmarks, files which have already been processed would be 'ignored'. These ignored files should be omitted from the list of read objects.


Solution

  • You could try this: dyf.toDF().withColumn("input_file", input_file_name()).select("input_file").distinct().show()