Search code examples
snowflake-schemasnowflake-tasksnowflake-cloud-data-platformsnowflake-pipe

How to load 533 columns of data into snowflake table?


We have a table with 533 columns with a lot of LOB columns that have to be moved to snowflake. Since our source transformation system having an issue to manage 533 columns in one job. We have split ted the columns into 2 jobs. The first job will insert 283 columns and the second job needs to update the remaining column.

We are using one copy command and upsert command respectively for these two jobs.

copy command

copy into "ADIUATPERF"."APLHSTRO"."FNMA1004_APPR_DEMO" (283 columns) from @"ADIUATPERF"."APLHSTRO".fnma_talend_poc/jos/outformatted.csv
--file_format = (format_name = '"ADIUATPERF"."APLHSTRO".CSV_DQ_HDR0_NO_ESC_CARET');
FILE_FORMAT = (DATE_FORMAT='dd-mm-yyyy', TIMESTAMP_FORMAT='dd-mm-yyyy',TYPE=CSV, ESCAPE_UNENCLOSED_FIELD = NONE,
SKIP_HEADER=1, field_delimiter ='|', RECORD_DELIMITER = '\\n', FIELD_OPTIONALLY_ENCLOSED_BY = '"',
               NULL_IF = ('')) PATTERN='' on_error = 'CONTINUE',FORCE=true;

Upsert command

MERGE INTO db.schema._table as target 
 USING
(SELECT t.$1
from @"ADIUATPERF"."APLHSTRO".fnma_talend_poc/jos/fnma1004_appr.csv 
--file_format = (format_name = '"ADIUATPERF"."APLHSTRO".CSV_DQ_HDR0_NO_ESC_CARET');
(FILE_FORMAT =>'CSV', ESCAPE_UNENCLOSED_FIELD => NONE,
SKIP_HEADER=>1, field_delimiter =>'|', RECORD_DELIMITER => '\\n', FIELD_OPTIONALLY_ENCLOSED_BY => '"',
               NULL_IF => (''))

) source ON target.document_id = source.document_id
WHEN MATCHED THEN
--update lst_updated
UPDATE SET <columns>=<values>;

I would like to know if we have any other option ?


Solution

  • I would recommend that you run the COPY INTO for both of your split files into temp/transient tables, first. And then execute a single CTAS statement using the JOIN between those 2 tables on document_id. Don't MERGE from a flat file. You could, optionally, run a MERGE on the 2nd temp table into the first table (not temp), if you wished, but I think a straight CTAS from 2 "half" tables might be faster for you.