Search code examples
pythonsqloracle-databasepyodbccx-oracle

Inner join query from SQL Server and Oracle with Python


Let's say I have a query like such:

SELECT Oracle.Column1, SqlServer.Column1
FROM SqlServer
INNER JOIN Oracle
on SqlServer.ID = Oracle.ID

I have established cursor connections in python to their respective databases using Cx_Oracle and pyodbc. Individual queries work great. Subqueries I can do. But this query with an inner join that requires both a connection to Oracle AND SQL Server I'm lost. How do I break this up or approach this programmatically?

My current version involves creating temp tables but that only works in SQL Server, not Oracle! The goal would be to use cursors to connect to their respective servers(can currently accomplish) but then what?


Solution

  • Simply, you cannot run a query across remote databases as if they are local tables. Within each cursor of a database connection, its universe is limited to the connecting server, schema, or catalog/database you specified. Only those objects (tables, functions/procedures, etc.) are available. Nothing beyond is recognized.

    With that said, most relational databases today now support external server connections including Oracle's Database Links, SQL Server's Linked Servers, DB2's Catalog, Postgres' Foreign Data Wrappers, MySQL's Federated Storage Engine, MS Access' Linked Tables, SQLite's ATTACH (but only other SQLite databases).

    Therefore, consider establishing a remote connection either in Oracle or SQL Server and then if connecting user is allowed access, run the needed JOIN query. Below are very simple examples of connection and querying. Do your research for appropriate connection parameters. Also, do remember an Oracle schema (just the user) is different from an SQL Server schema (namespace container of objects).

    Oracle

    A database link is a connection between two physical database servers that allows a client to access them as one logical database.

    CREATE PUBLIC DATABASE LINK ... USING 'mssql_db';  -- TO BE RUN ONCE
    
    SELECT o.Column1, s.Column1
    FROM oracle_local_table o
    INNER JOIN mssql_db.sql_server_remote_table s
      ON s.ID = o.ID
    

    SQL Server

    A linked server allows for access to distributed, heterogeneous queries against OLE DB data sources. After a linked server is created, distributed queries can be run against this server, and queries can join tables from more than one data source.

    EXEC master.dbo.sp_addlinkedserver           -- TO BE RUN ONCE
           @server = N'oracle_server', 
           @srvproduct=N'...', 
           @provider=N'...', 
           @provstr=N'...'
    
    SELECT o.Column1, s.Column1
    FROM oracle_server..user.oracle_remote_table o
    INNER JOIN sql_server_local_table s
      ON s.ID = o.ID
    
    -- ALTERNATIVELY
    SELECT o.Column1, s.Column1
    FROM OPENQUERY (oracle_server, 'SELECT * FROM scott.oracle_remote_table') o
    INNER JOIN sql_server_local_table s
      ON s.ID = o.ID 
    

    Alternatives in Python if above prove too difficult or requires high level privileged permission:

    • Create a temp table in either side and dump remote table records via cursor.fetch and cursor.execute or cursor.executemany and then run JOIN query locally.

    • Import both tables into an SQLite database (disk or in-memory instance) and run JOIN query. Note: in Python 3, sqlite3 is part of the standard library and hence ships with Python installations.

    • Use Pythons's third party module, pandas, to import both tables into data frames and run merge (counterpart to SQL's JOIN). Interfacing with sqlalchemy, pandas can even dump data frames in one call to databases using to_sql.