I am trying to access the snowflake datasource using "great_expectations" library.
The following is what I tried so far:
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
print(context.test_yaml_config(yaml.dump(datasource_config)))
I initiated great_expectation before executing above code:
great_expectations init
but I am getting the error below:
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'
What am I doing wrong?
Your configuration seems to be ok, corresponding to the example here.
If you look at the traceback you should notice that the error propagates starting at the file great_expectations/execution_engine/sqlalchemy_execution_engine.py
in your virtual environment.
The actual line where the error occurs is:
self.engine = sa.create_engine(connection_string, **kwargs)
And if you search for that sa
at the top of that file:
import sqlalchemy as sa
make_url = import_make_url()
except ImportError:
sa = None
So sqlalchemy is not installed, which you don't get automatically in your environement if you install greate_expectiations. The thing to do is to install snowflake-sqlalchemy, since you want to use sqlalchemy's snowflake plugin (assumption based on your connection_string).
/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy
After that you should no longer get an error, it looks like test_yaml_config
is waiting for the connection
to time out.
What worries me greatly is the documented use of a deprecated API of ruamel.yaml
.
The function ruamel.yaml.dump
is going to be removed in the near future, and you
should use the .dump()
method of a ruamel.yaml.YAML()
instance.
You should use the following code instead:
import sys
from ruamel.yaml import YAML
import great_expectations as ge
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
yaml = YAML()
yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)
I'll make a PR for great-excpectations to update their documentation/use of ruamel.yaml
.