I need to perform an append load to S3 bucket.
Now I need to write this dynamic data frame to S3 bucket which has all the previous day partitions present. In-fact I just need to write only one partition to the S3 bucket.Currently I am using the below piece of code to write data to S3 bucket.
// Write it out in Parquet for ERROR severity
glueContext.getSinkWithFormat(
connectionType = "s3",
options = JsonOptions(Map("path" -> "s3://some s3 bucket location",
"partitionKeys" -> Seq("partitonyear","partitonmonth","partitonday"))),
format = "parquet").writeDynamicFrame(DynamicFrame(dynamicDataframeToWrite.toDF().coalesce(maxExecutors), glueContext))
I am not sure if the above piece of code will perform an append load or not.Is there a way through AWS glue libraries to achieve the same?
Your script will append new data files to appropriate partition. So if you are processing only today's data then it will create a new data partition under the path
. For example, if today is 2018-11-28 it will create new data object in s3://some_s3_bucket_location/partitonyear=2018/partitonmonth=11/partitonday=28/
folder.
If you try to write data into existing partition then Glue will append new files and will not remove existing objects. However this may lead to duplicates if run a job multiple times to process the same data.