Search code examples
amazon-s3snowflake-cloud-data-platformsnowflake-schemaexternal-tables

How to create a external table when date is in between s3 path


I have s3 bucket as below,

s3://rawzone/toyoto/2020-11-04/

inside each date I have 4 files

 s3://rawzone/toyoto/2020-11-04/Company.json
 s3://rawzone/toyoto/2020-11-04/sales.json
 s3://rawzone/toyoto/2020-11-04/transport.json
 s3://rawzone/toyoto/2020-11-04/preaquisitions.json

and I have so many dates like this

 s3://rawzone/toyoto/2020-11-05/Company.json
 s3://rawzone/toyoto/2020-11-05/sales.json
 s3://rawzone/toyoto/2020-11-05/transport.json
 s3://rawzone/toyoto/2020-11-05/preaquisitions.json

I need to create a table for across all dates for Company,sales,transport,preaquisitions

How can I retrieve this data from s3 bucket for each table?

So far tried (kept ??? where I don't know how to replace it)

 create or replace external table toyoto.PB_compnay
    (
       columns mapping......
    )
    partition by ???
    with location = @raw_zone/pitchbook/??????
    file_format = json_format
    aws_sns_topic='arn:aws:sns:us-west-1:5438:dev-autore'
    auto_refresh = true 

Solution

  • The practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.

    If you prefer to have 4 tables based on the filenames, I would recommend having a pattern that would pick up all the files that have that name.

    create or replace external table toyoto.PB_compnay
     (
       columns mapping......
     )
     with location = @stage/path/
     file_format = json_format
     pattern='.*/.*/Company[.]json'
     aws_sns_topic='arn:aws:sns:us-xxx:xxx:xxxx'
     auto_refresh = true