Search code examples
snowflake-cloud-data-platformpartitionstage

Common practice for stages


Snowflake allows to put files of different structure in just one stage using different paths. On the other hand we can put files of the same structure in separate stage.

Is stage a store for several tables of a schema or is stage a mean to store data for a partitioned table? What is the usual practice?


Solution

  • There are a few different types of stages in Snowflake:

    Internal Stages (Named, User and Table): With these types of stages, you upload the files directly to Snowflake. If you wanted to load data into multiple tables from a single stage you can either use a "Named" or "User" stage. A "Table" stage is automatically created when you create a table and it's for loading data into a single table only. With all internal stages, you typically upload data into Snowflake using SnowSQL from your local machine or a server and then run a copy command into a table.

    External Stages (External Stages): External stages are the most common in my experience. You create a stage inside Snowflake that points to a cloud provider's blob storage service (s3, gcs, azure blob). The files are not stored in Snowflake like they are with an Internal Stage, they are stored in s3 (or whatever) and you can run copy commands to load into any table.

    There is no right answer, you can either use Internal (Named or User) or External stages to load into multiple tables. My preference is to use an external stage, that way the data resides outside of Snowflake and can be loaded into other tools too if necessary.