I'm using create_dynamic_frame.from_options
to read CSV files into a Glue Dynamic Dataframe. My Glue job is using bookmark and from_options
has both a transformation ctx configured and recursive search.
dyf = glueContext.create_dynamic_frame.from_options("s3",
{
"paths": [
"s3://bucket/files/"
],
"recurse" : True
},
transformation_ctx = "example"
)
s3://bucket/files
contains multiple CSVs. Is there a way to get a list of which objects were actually read? As I'm using bookmarks, files which have already been processed would be 'ignored'. These ignored files should be omitted from the list of read objects.
You could try this: dyf.toDF().withColumn("input_file", input_file_name()).select("input_file").distinct().show()