Search code examples
pysparkapache-spark-sqlprivilegessnowflake-cloud-data-platform

SparkSQL (Databricks): Insert data into Snowflake Table, created by different role


I have a table MYSCHEMA.TEST_SNOWFLAKE_ROLE_T in Snowflake created using the role CONSOLE_USER.

MYSCHEMA has a FUTURE GRANTS associated with it, which grants the following privileges to the role BATCH_USER for any table created under the schema MYSCHEMA - DELETE, INSERT, REFERENCES, SELECT, TRUNCATE, UPDATE.

The role BATCH_USER also has CREATE STAGE and USAGE privileges on the schema MYSCHEMA.

A second user belonging to the role BATCH_USER tries to insert data into the same table from a dataframe, using the following Spark SQL (Databricks), but fails with an insufficient privileges error message.

df.write.mode(op_mode)                   \ 
  .format("snowflake")                   \ 
  .options(**self.sfoptions)             \
  .option("dbtable", snowflake_tbl_name) \
  .option("truncate_table", "on")        \
  .save

The following error message appears:

Py4JJavaError: An error occurred while calling o908.save.
   : net.snowflake.client.jdbc.SnowflakeSQLException: SQL access control error
   : Insufficient privileges to operate on table 'TEST_SNOWFLAKE_ROLE_T')

The role CONSOLE_USER has ownership rights on the table, hence the role BATCH_USER would not be able to drop the table, but adding the option option("truncate_table", "on") should have prevented automatic overwrite of the Table schema.

I've gone through the available Snowflake and Databricks documentation several times, but can't seem to figure out what is causing the insufficient privilege issue.

Any help is much appreciated!


Solution

  • I figured it out eventually.

    The error occured because the table was created by the role CONSOLE_USER, which retained ownership privileges on the table.

    The Spark connector for Snowflake uses a staging table for writing the data. If the data loading operation is successful, the original target table is dropped and the staging table is renamed to the original target table’s name.

    Now, in order to rename a table or swap two tables, the role used to perform the operation must have OWNERSHIP privileges on the table(s). In the situation above, the ownership was never transferred to the role BATCH_USER, hence the error.

    df.write.mode(op_mode)                   \ 
      .format("snowflake")                   \ 
      .options(**self.sfoptions)             \
      .option("dbtable", snowflake_tbl_name) \
      .option("truncate_table", "on")        \
      .option("usestagingtable", "off")      \
      .save
    

    The solution was to avoid using a staging table altogether, although going by the documentation, Snowflake recommends using one, pretty strongly.