Search code examples
sqlamazon-s3snowflake-cloud-data-platformexternal-tablessnowsql

Create External Table pointing to S3


How do we create an external table using Snowflake sql that points to a directory in S3? Below is the code I tried so far, but didn't work. Any help is highly appreciated.

create external table my_table
(
column1 varchar(4000),
column2 varchar(4000)
)
LOCATION 's3a://<externalbucket>'

Note : The file that I have in the S3 bucket is a csv file (comma seperated, double quotes enclosed and with header).


Solution

  • Finally I sorted this out. Posting this answer as to make the answer simple to understand especially for the beginners.

    Say that I have a csv file in the S3 location in the below format.

    enter image description here

    Step 1 :

    Create a file format in which you can define what type of file it is, field delimiter, data enclosed in double quotes, skip the header of the file etc.

    create or replace file format schema_name.pipeformat
    type = 'CSV' 
    field_delimiter = '|' 
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    skip_header = 1
    

    https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

    Step 2 :

    Create a Stage to specify the S3 details and file format.

    create or replace stage schema_name.stage_name 
    url='s3://<path where file is kept>'
    credentials=(aws_key_id='****' aws_secret_key='****')
    file_format = pipeformat
    

    https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#required-parameters

    Step 3 :

    Create the external table based on the Stage name and file format.

    create or replace external table schema_name.table_name 
    (
        RollNumber INT as  (value:c1::int), 
        Name varchar(20) as ( value:c2::varchar), 
        Marks int as (value:c3::int)
    )
    with location = @stage_name
    file_format = pipeformat
    

    https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html

    Step 4 :

    Now you should be able to query from the external table.

    select * 
    from schema_name.table_name