Search code examples
pythonoraclegreat-expectationspython-oracledb

Using Python Great Expectations and python-oracledb


I would like to set up an oracledb as a datasource but can't get the syntax. Is this even possible?

import oracledb
from sqlalchemy import create_engine
import getpass

username = getpass.getuser()
userpwd = getpass.getpass()
dsn="my_server"
# connect to oracle database
connection=oracledb.connect(
     user=username,
     password=userpwd,
     dsn=dsn
)
# create engine
engine = create_engine('oracle+oracledb://', creator=lambda: connection)

This is successful for connecting and passing to, for example pandas.read_sql.

But I want to use gx:

import great_expectations as gx
context = gx.get_context()
context.sources.add...?

I tried to get a connection string

cp = oracledb.ConnectParams(user=username, password=userpwd,host=dsn)
conn_string = cp.get_connect_string()
datasource = context.sources.add_sql(
    name="my_datasource", connection_string=conn_string
)

But this dowsn't work

TestConnectionError: Attempt to connect to datasource failed with the following error message: Unable to create a SQLAlchemy engine due to the following exception: Could not parse SQLAlchemy URL from string '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myserver)(PORT=1521)))'

Any ideas?


Solution

  • This creates a data source for me:

    import getpass
    import os
    import oracledb
    import great_expectations as gx
    
    un = os.environ.get('PYTHON_USERNAME')
    cs = os.environ.get('PYTHON_CONNECTSTRING')
    pw = getpass.getpass(f'Enter password for {un}@{cs}: ')
    
    # I use Easy Connect strings like "localhost/orclpdb1".  These two lines
    # let me access the components individually
    cp = oracledb.ConnectParams()
    cp.parse_connect_string(cs)
    
    connection_string = f'oracle+oracledb://{un}:{pw}@{cp.host}:{cp.port}/?service_name={cp.service_name}'
    
    context = gx.get_context()
    
    datasource = context.sources.add_sql(
        name="my_datasource", connection_string=connection_string
    )
    
    asset_name = "my_query_asset"
    query = "select * from employees"
    
    query_asset = datasource.add_query_asset(name=asset_name, query=query)
    
    for i in query_asset:
        print(i)
    
    

    Running it gives:

    ('name', 'my_query_asset')
    ('type', 'query')
    ('id', None)
    ('order_by', [])
    ('batch_metadata', {})
    ('splitter', None)
    ('query', 'select * from employees')
    

    For information on connecting oracledb with SQLAlchemy, see Using SQLAlchemy 2.0 with python-oracledb for Oracle Database.