Search code examples
python-3.xsqlalchemyruamel.yamlgreat-expectationssnowflake-connector

unable to initialize snowflake data source


I am trying to access the snowflake datasource using "great_expectations" library.

The following is what I tried so far:

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

context = ge.get_context()



datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

print(context.test_yaml_config(yaml.dump(datasource_config)))

I initiated great_expectation before executing above code:

great_expectations init

but I am getting the error below:

great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'

What am I doing wrong?


Solution

  • Your configuration seems to be ok, corresponding to the example here.

    If you look at the traceback you should notice that the error propagates starting at the file great_expectations/execution_engine/sqlalchemy_execution_engine.py in your virtual environment.

    The actual line where the error occurs is:

                self.engine = sa.create_engine(connection_string, **kwargs)
    

    And if you search for that sa at the top of that file:

    import sqlalchemy as sa
    
    make_url = import_make_url()
    except ImportError:
            sa = None
    

    So sqlalchemy is not installed, which you don't get automatically in your environement if you install greate_expectiations. The thing to do is to install snowflake-sqlalchemy, since you want to use sqlalchemy's snowflake plugin (assumption based on your connection_string).

    /your/virtualenv/bin/python -m pip install snowflake-sqlalchemy
    

    After that you should no longer get an error, it looks like test_yaml_config is waiting for the connection to time out.

    What worries me greatly is the documented use of a deprecated API of ruamel.yaml. The function ruamel.yaml.dump is going to be removed in the near future, and you should use the .dump() method of a ruamel.yaml.YAML() instance.

    You should use the following code instead:

    import sys
    from ruamel.yaml import YAML
    
    import great_expectations as ge
    context = ge.get_context()
    
    datasource_config = {
        "name": "my_snowflake_datasource",
        "class_name": "Datasource",
        "execution_engine": {
            "class_name": "SqlAlchemyExecutionEngine",
            "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
        },
        "data_connectors": {
            "default_runtime_data_connector_name": {
                "class_name": "RuntimeDataConnector",
                "batch_identifiers": ["default_identifier_name"],
            },
            "default_inferred_data_connector_name": {
                "class_name": "InferredAssetSqlDataConnector",
                "include_schema_name": True,
            },
        },
    }
    
    yaml = YAML()
    
    yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)
    

    I'll make a PR for great-excpectations to update their documentation/use of ruamel.yaml.