Search code examples
pythonoracle-databasesqlalchemycx-oracle

SQLAlchemy bulk insert blob data in Oracle DB


I'm trying to make bulk insert operation in SQLAlchemy with Oracle DB, which inserts 60k rows with blob data.

This is how my table and code looks like:

CREATE TABLE datatables (
    id INTEGER NOT NULL, 
    table_name VARCHAR2(50 CHAR), 
    row_id VARCHAR2(50 CHAR), 
    row_data BLOB, 
    PRIMARY KEY (id)
)
with Session() as session:
    session.execute(
        DataTables.__table__.insert(),
        datas
    )

Where DataTables is SQLAclhemy class mapping for table and datas is list of dicts like this {'id': 1, 'table_name': 'app', 'row_id': 'version', 'row_data': '....'}

with such code i got this sql statement

sqlalchemy.engine.Engine INSERT INTO datatables (id, table_name, row_id, row_data) VALUES (:id, :table_name, :row_id, :row_data)
sqlalchemy.engine.Engine [generated in 0.14718s] [{'id': 1, 'table_name': 'app', 'row_id': 'version', 'row_data': b'some_data'}, ...]

and it runs forever, even after 30 minutes it is not finished. When I enable DPI trace there are a lot of such rows:

ODPI [12912] 2021-12-18 00:07:36.019: ref 0000019C3D15C5B0 (dpiConn) -> 8
ODPI [12912] 2021-12-18 00:07:36.020: ref 0000019C413F0C20 (dpiLob) -> 1 [NEW]
ODPI [12912] 2021-12-18 00:07:36.020: ref 0000019C3D15C5B0 (dpiConn) -> 9
ODPI [12912] 2021-12-18 00:07:36.020: ref 0000019C413F1990 (dpiLob) -> 1 [NEW]

However if I use raw sql:

session.execute('insert into DATATABLES (id, table_name, row_id, row_data) values (:id, :table_name, :row_id, :row_data)', datas)

DPI trace changed to this:

ODPI [00796] 2021-12-18 00:14:55.246: ref 000002486741EAF0 (dpiVar) -> 0
ODPI [00796] 2021-12-18 00:14:55.246: ref 00000248617D2DF0 (dpiConn) -> 6
ODPI [00796] 2021-12-18 00:14:55.246: fn end dpiVar_release(000002486741EAF0) -> 0
ODPI [00796] 2021-12-18 00:14:55.246: fn start dpiVar_setFromBytes(000002486741EBB0)
ODPI [00796] 2021-12-18 00:14:55.247: fn end dpiVar_setFromBytes(000002486741EBB0) -> 0
ODPI [00796] 2021-12-18 00:14:55.247: fn start dpiVar_setFromBytes(0000024864E5FDE0)
ODPI [00796] 2021-12-18 00:14:55.247: fn end dpiVar_setFromBytes(0000024864E5FDE0) -> 0
ODPI [00796] 2021-12-18 00:14:55.247: fn start dpiVar_setFromBytes(0000024864E601A0)
ODPI [00796] 2021-12-18 00:14:55.247: fn end dpiVar_setFromBytes(0000024864E601A0) -> 0

and bulk insert is done in 15 seconds.

SQL statement is same in both cases. Why does it work diffent? Of course I can use raw sql, but I want to use DataTables.__table__.insert(), because if table or columns names will change - I don't need to fix sql for this every time.


Solution

  • issues like these should preferably be reported as bugs in SQLAlchemy, which you can do at : https://github.com/sqlalchemy/sqlalchemy/issues/ otherwise we were completely not aware of this problem.

    in this case we've been alerted to this issue and will seek to update our LOB settings per cx_oracle developers for SQLAlchemy 2.0, which is tracked at https://github.com/sqlalchemy/sqlalchemy/issues/7494