Search code examples
pythonsqlalchemysnowflake-cloud-data-platformgreat-expectations

Great Expectations SQLAlchemy doesn't enclose lowercase column names


I am trying to use SQLAlchemy and Great Expectations for testing data quality of datasets stored in Snowflake DB. The dataset is called candidates and the column of interest is called first_name.

However, when I run

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="candidates", engine=engine, schema=creds["schema"])
sql_dataset.expect_column_values_to_be_in_set("first_name", ['Gather', 'Male'])

I am getting:

ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (42S02): SQL compilation error:
Object 'KEBOOLA_274.WORKSPACE_48777448.CANDIDATES' does not exist or not authorized.
[SQL: SELECT count(*) AS element_count, sum(CASE WHEN (first_name IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS null_count, sum(CASE WHEN (first_name NOT IN (%(first_name_1)s, %(first_name_2)s) AND first_name IS NOT NULL) THEN %(param_3)s ELSE %(param_4)s END) AS unexpected_count 
FROM "WORKSPACE_48777448".candidates]
[parameters: {'param_1': 1, 'param_2': 0, 'first_name_1': 'Gather', 'first_name_2': 'Male', 'param_3': 1, 'param_4': 0}]
(Background on this error at: http://sqlalche.me/e/13/f405)

The issue is the table name is not enclosed in double quotes, hence Snowflake is looking for CANDIDATES instead of candidates. The same issue would arise with the column name should the table be found.

I've tested

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="\"candidates\"",
                                                           engine=engine,
                                                           schema=creds["schema"])

bu Snowflake is then looking for a table called KEBOOLA_274.WORKSPACE_48777448.""candidates"".

I know SQLAlchemy by default considers lowercase object names to be case insensitive, therefore it doesn't enclose the names. Is there any way around it please?


Solution

  • I work for Superconductive as a developer of Great Expectations. I submitted a fix for this a few weeks back! You can now specify the use_quoted_name property in your batch_kwargs. If this is set to True, it will treat your table and column_names as case sensitive, so you should have no trouble accessing lowercase table names, though you will also need to make sure that the case of your other table and column names are appropriately specified.