I have a list of files in the s3
bucket as below. I want to load only the current date files, for example,
I want to get all file_a.csv which are under 2020/06/09, similarly get all file_b.csv under same 2020/06/09
I tried out the syntax but this gets me all files from all dates
select metadata$filename from @stage/snflk/ts/(file_format=>CSV_SKIP_HEADER,pattern=>'.*/file_a.*[.]csv')
snflk/ts/ts-prod/2020/06/08/file_a.csv
snflk/ts/ts-prod/2020/06/09/file_a.csv
snflk/ts/ts-prod3/2020/06/08/file_a.csv
snflk/ts/ts-prod3/2020/06/09/file_a.csv
snflk/ts/ts-prod2/2020/06/08/file_a.csv
snflk/ts/ts-prod2/2020/06/09/file_a.csv
snflk/ts/ts-prod/2020/06/08/file_b.csv
snflk/ts/ts-prod/2020/06/09/file_b.csv
snflk/ts/ts-prod3/2020/06/08/file_b.csv
snflk/ts/ts-prod3/2020/06/09/file_b.csv
snflk/ts/ts-prod2/2020/06/08/file_b.csv
snflk/ts/ts-prod2/2020/06/09/file_b.csv
I want to load only the current date files
.*/file_a.*[.]csv
this gets me all files from all dates
The pattern (.*
) supplied at the beginning here is a wildcard that will match everything available.
If you need to limit the date to a constant value, specify the constant values inside the regular expression pattern (i.e. ts-prod.*/2020/06/09/file_a\.csv
, ts-prod.*/2020/06/09/file_b\.csv
, etc.):
select
metadata$filename
from @stage/snflk/ts/
(pattern=>'ts-prod.*/2020/06/09/file_a\.csv');
To iteratively test and write your regex patterns, you can use web tools such as Regex101, RegExr, etc.
P.s. You can also construct the pattern to use the current date dynamically, if you are trying to automate this:
-- Produces 'ts-prod.*/2020/06/11/file_a\.csv'
SET curr_dt_a=(
SELECT
'ts-prod.*/' ||
TO_VARCHAR(CURRENT_DATE(), 'YYYY/MM/DD') ||
'/file_a\\.csv'
)
-- Referenced as a SQL variable in the PATTERN option
SELECT
metadata$filename
FROM @stage/snflk/ts/
(PATTERN=>$curr_dt_a);