Search code examples
pythonsqljsonsnowflake-cloud-data-platformsnowflake-connector

Snowflake Copy Into rejecting file without error in Python Snowflake Connector


I'm using the Python Snowflake Connector to PUT a JSON file into a Snowflake Stage and then COPY INTO to insert the JSON into a table.

Here's my code:

import snowflake.connector

snowflake_conn = snowflake.connector.connect(
            user=sf_user,
            password=sf_password,
            account=sf_account
            )

role_init = "USE ROLE ELT_ROLE"
wh_init = "USE WAREHOUSE TEST_WH"
db_init = "USE DATABASE TEST_DB"
schema_init = "USE SCHEMA TEST_SCHEMA"

snowflake_conn_cur.execute(role_init)
snowflake_conn_cur.execute(wh_init)
snowflake_conn_cur.execute(db_init)
snowflake_conn_cur.execute(schema_init)

remove_file_command = 'REMOVE @TEST_STAGE/test_file.json;'
put_file_command = 'PUT file://test_file.json @TEST_STAGE;'
truncate_existing_table_data_command = 'TRUNCATE TABLE OUTPUT_TABLE;'
copy_file_command = 'COPY INTO OUTPUT_TABLE FROM @TEST_STAGE/test_file.json file_format=(TYPE=JSON) on_error=CONTINUE;'

snowflake_conn_cur.execute(remove_file_command)
snowflake_conn_cur.execute(put_file_command)
snowflake_conn_cur.execute(truncate_existing_table_data_command)
snowflake_conn_cur.execute(copy_file_command)

My code executes successfully, but I noticed in Snowflake that the file was rejected (separate issue).

In the Snowflake Python Connector, on the cursor execute statements, is there a way to have it return back an error and use that to validate that it completed successfully or not?

Without that, it is basically failing silently. The only other way I can think of is to query the table after the fact to see if it has data, but that may not always help if the table wasn't truncated beforehand.


Solution

  • Remove the "on_error=CONTINUE" command in the COPY INTO function in order to raise the error. The issue that was causing the error when trying to load the file into the table is that the file is too large.

    To resolve the file size issue, since my JSON was encapsulated into an array, setting the STRIP_OUTER_ARRAY=TRUE on the file_format of the COPY INTO command removes the array and loads each JSON node into its own row in the target table.