Can we use AWS Glue for the following?
Yes this can be done using "connectionType": "mongodb" as Source
in your Glue ETL job, refer to this for syntax.
Also this has below example which read data from mongodb which then can be written to S3 in parquet file format.
mongo_uri = "mongodb://<mongo-instanced-ip-address>:27017"
read_mongo_options = {
"uri": mongo_uri,
"database": "test",
"collection": "coll",
"username": "username",
"password": "pwd",
"partitioner": "MongoSamplePartitioner",
"partitionerOptions.partitionSizeMB": "10",
"partitionerOptions.partitionKey": "_id"}
dynamic_frame = glueContext.create_dynamic_frame.from_options(connection_type="mongodb",
connection_options=read_mongo_options)
Once you have the data then you write data back to s3 using below statement after doing any transformations that you wanted to perform:
glueContext.write_dynamic_frame.from_options(frame = dynamic_frame,
connection_type = "s3",
connection_options = {"path": "s3://glue-parquet/output-dir"},
format = "parquet")