Search code examples
oraclems-accessodbcvbams-access-2003

How to increase performance for bulk INSERTs to ODBC linked tables in Access?


I have CSV and TXT files to import. I am importing the files into Access and then inserting the records into a linked Oracle table. Each file has around 3 million rows and the process is taking a long time to complete.

Importing into Access is very fast, but inserting into the linked Oracle table is taking an extremely long time.

Here is the process I am currently using:

DoCmd.TransferText acImportFixed, "BUSSEP2014 Link Specification", "tblTempSmartSSP", strFName, False
db.Execute "INSERT INTO METER_DATA ([MPO_REFERENCE]) SELECT MPO_REFERENCE FROM tblTempSmartSSP;"`

tblTempSmartSSP is an Access Table and METER_DATA is a linked Oracle table

I also tried direct import to linked table and that was also very slow.

How can I speed up the process?


Solution

  • This situation is not uncommon when dealing with bulk INSERTs to ODBC linked tables in Access. In the case of the following Access query

    INSERT INTO METER_DATA (MPO_REFERENCE) 
    SELECT MPO_REFERENCE FROM tblTempSmartSSP
    

    where [METER_DATA] is an ODBC linked table and [tblTempSmartSSP] is a local (native) Access table, the Access Database Engine is somewhat limited in how clever it can be with ODBC linked tables because it has to be able to accommodate a wide range of target databases whose capabilities may vary greatly. Unfortunately, it often means that despite the single Access SQL statement what actually gets sent to the remote (linked) database is a separate INSERT (or equivalent) for each row in the local table. Understandably, that can prove to be very slow if the local table contains a large number of rows.

    Option 1: Native bulk inserts to the remote database

    All databases have one or more native mechanisms for the bulk loading of data: Microsoft SQL Server has "bcp" and BULK INSERT, and Oracle has "SQL*Loader". These mechanisms are optimized for bulk operations and will usually offer significant speed advantages. In fact, if the data needs to be imported into Access and "massaged" before being transferred to the remote database it can still be faster to dump the modified data back out to a text file and then bulk import it into the remote database.

    Option 2(a): Using Python and pandas

    pyodbc with fast_executemany=True can upload rows much faster than INSERT INTO … SELECT … on a linked table. See this answer for details.

    Option 2(b): Using a pass-through query in Access

    If the bulk import mechanisms are not a feasible option, then another possibility is to build one or more pass-through queries in Access to upload the data using INSERT statements that can insert more than one row at a time.

    For example, if the remote database was SQL Server (2008 or later) then we could run an Access pass-through (T-SQL) query like this

    INSERT INTO METER_DATA (MPO_REFERENCE) VALUES (1), (2), (3)
    

    to insert three rows with one INSERT statement.

    According to an answer to another earlier question here the corresponding syntax for Oracle would be

    INSERT ALL
        INTO METER_DATA (MPO_REFERENCE) VALUES (1)
        INTO METER_DATA (MPO_REFERENCE) VALUES (2)
        INTO METER_DATA (MPO_REFERENCE) VALUES (3)
    SELECT * FROM DUAL;
    

    I tested this approach with SQL Server (as I don't have access to an Oracle database) using a native [tblTempSmartSSP] table with 10,000 rows. The code ...

    Sub LinkedTableTest()
        Dim cdb As DAO.Database
        Dim t0 As Single
        
        t0 = Timer
        Set cdb = CurrentDb
        cdb.Execute _
                "INSERT INTO METER_DATA (MPO_REFERENCE) " & _
                "SELECT MPO_REFERENCE FROM tblTempSmartSSP", _
                dbFailOnError
        Set cdb = Nothing
        Debug.Print "Elapsed time " & Format(Timer - t0, "0.0") & " seconds."
    End Sub
    

    ... took approximately 100 seconds to execute in my test environment.

    By contrast the following code, which builds multi-row INSERTs as described above (using what Microsoft calls a Table Value Constructor) ...

    Sub PtqTest()
        Dim cdb As DAO.Database, rst As DAO.Recordset
        Dim t0 As Single, i As Long, valueList As String, separator As String
    
        t0 = Timer
        Set cdb = CurrentDb
        Set rst = cdb.OpenRecordset("SELECT MPO_REFERENCE FROM tblTempSmartSSP", dbOpenSnapshot)
        i = 0
        valueList = ""
        separator = ""
        Do Until rst.EOF
            i = i + 1
            valueList = valueList & separator & "(" & rst!MPO_REFERENCE & ")"
            If i = 1 Then
                separator = ","
            End If
            If i = 1000 Then
                SendInsert valueList
                i = 0
                valueList = ""
                separator = ""
            End If
            rst.MoveNext
        Loop
        If i > 0 Then
            SendInsert valueList
        End If
        rst.Close
        Set rst = Nothing
        Set cdb = Nothing
        Debug.Print "Elapsed time " & Format(Timer - t0, "0.0") & " seconds."
    End Sub
    
    Sub SendInsert(valueList As String)
        Dim cdb As DAO.Database, qdf As DAO.QueryDef
        
        Set cdb = CurrentDb
        Set qdf = cdb.CreateQueryDef("")
        qdf.Connect = cdb.TableDefs("METER_DATA").Connect
        qdf.ReturnsRecords = False
        qdf.sql = "INSERT INTO METER_DATA (MPO_REFERENCE) VALUES " & valueList
        qdf.Execute dbFailOnError
        Set qdf = Nothing
        Set cdb = Nothing
    End Sub
    

    ... took between 1 and 2 seconds to produce the same results.

    (T-SQL Table Value Constructors are limited to inserting 1000 rows at a time, so the above code is a bit more complicated than it would be otherwise.)