I have job in Redshift that is responsible for pulling 6 files every month from S3. File names follow a standard naming convention as "file_label_MonthNameYYYY_Batch01.CSV". I'd like to modify the below COPY command to change the file naming in the S3 directory dynamically so I won't have to hard code the Month Name and YYYY and batch number. Batch number ranges 1-6.
Currently, here is what I have which is not efficient:
COPY tbl_name ( column_name1, column_name2, column_name3 )
FROM 'S3://bucket_name/folder_name/Static_File_Label_July2021_Batch01.CSV'
CREDENTIALS 'aws_access_key_id = xxx;aws_secret_access_key = xxxxx'
removequotes
EMPTYASNULL
BLANKSASNULL
DATEFORMAT 'MM/DD/YYYY'
delimiter ','
IGNOREHEADER 1;
COPY tbl_name ( column_name1, column_name2, column_name3 )
FROM 'S3://bucket_name/folder_name/Static_File_Label_July2021_Batch02.CSV'
CREDENTIALS 'aws_access_key_id = xxx;aws_secret_access_key = xxxxx'
removequotes
EMPTYASNULL
BLANKSASNULL
DATEFORMAT 'MM/DD/YYYY'
delimiter ','
IGNOREHEADER 1;
The dynamic file name shall change to August2021_Batch01 & August2021_Batch02 next month and so forth. Is there a way to do this? Thank you in advance.
There are lots of approaches to this. Which one is best for your case will depend on your circumstances. You need a layer in your process that controls configuring SQL for each month. Here are some ways to consider:
It is desirable to load multiple files with a COPY command (uses more nodes in parallel) and options 1, 2, and 5 do this.