Search code examples
sqlsnowflake-cloud-data-platformdata-partitioning

How to generate a single file per partition - Snowflake COPY into location


I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition?

It also would be great if I can zip all the files.

This is what I got so far:

COPY INTO 'gcs_bucket'
  FROM test
  PARTITION BY TRUNC(number_of_rows/500000)
  STORAGE_INTEGRATION = gcs_int
  FILE_FORMAT = (TYPE = CSV, COMPRESSION = gzip, NULL_IF = ('NULL','null'), RECORD_DELIMITER= '\r\n', FIELD_OPTIONALLY_ENCLOSED_BY = "'")
  HEADER = TRUE

PS. I'm using csv format (can't change that)


Solution

  • The upper size limit of each file could be changed with MAX_FILE_SIZE option. Default is 16MB.

    COPY INTO 'gcs_bucket'
      FROM test
      PARTITION BY TRUNC(number_of_rows/500000)
      STORAGE_INTEGRATION = gcs_int
      ...
      MAX_FILE_SIZE = 167772160 -- (160MB)
    

    MAX_FILE_SIZE = num

    Definition

    Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.

    Snowflake utilizes parallel execution to optimize performance. The number of threads cannot be modified.**