Search code examples
parallel-processingloadsnowflake-cloud-data-platformparquet

snowflake Copy Into parallel Parquet File load


How can I One time load 5years Parquet data with COPY INTO in a table ? because 1 Month load tooks me 1.5Hours , 5Years will take me 90Hours. if there a possiblity to parallel Load ? and how can i do it ?

The code is for 1Month

COPY INTO "PUBLIC"."MYTABLE" ("One_Month_Transactions") FROM SELECT ($1:"One_Month_Transactions" @my_azure_stage/data201601);

Thanks


Solution

  • Take the number of files that you have (hopefully, they are in many, many files) and size your warehouse to load them all in a single COPY INTO statement. Assuming you dedicate a warehouse for a single COPY INTO statement, it'll be able to load 8 files in parallel per node. 1 node is an XS, 2 is a S, 4 is a Medium, and doubles from there.

    If you provide a number of files per 1 month, what size warehouse you used, then I could help more accurately size things for you, but the key is doing it all in 1 COPY INTO statement...and do math on 8 files/node * # of nodes per size.