Search code examples
pythonpython-multiprocessingpython-multithreadingpyodbc

Does pyodbc support multiprocessing?


The intent of the code below is to perform several database updates in parallel using pyodbc and mysql. Question: are the connections opened independently and in parallel? Will this code work or I need to take a different approach? I tried to investigate pyodbc and multithreading and didn't find much.

import multiprocessing, pyodbc

tables = ['tab1', 'tab2', 'tab3', 'tab4', 'tab5', 'tab6']
for table in tables:
    p = multiprocessing.Process(target=process_table, args=(table,))
    p.start()


def process_table(table):

    conn = pyodbc.connect(.....)
    cursor = conn.cursor()
    
    for i in range(0, 10):   
        sql = 'update ' + table + ' set col = ' + str(i)
        cursor.execute(sql)
    
    conn.commit()

    cursor.close()
    conn.close()

Solution

  • Let's get to the point.

    1. are the connections opened independently and in parallel?
      • Yes, but there's a catch depending on your expectations.
    2. Will this code work or I need to take a different approach?
      • This code should work. But you need to understand that Databases are I/O constrained and not compute constrained. When you create a new process, the underlying I/O limits still remains the same. It's better to use a Thread instead. Learn about the parameters and then evaluate how it will benefit your use case.

    Here's an example code on how to do it with a Thread:

    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    import pyodbc
    
    executor = ThreadPoolExecutor() # or ProcessPoolExecutor
    tables = ['tab1', 'tab2', 'tab3', 'tab4', 'tab5', 'tab6']
    futures = []
    for table in tables:
        result = executor.submit(process_table, table)
        futures.append(result)
    
    results = [future.result() for future in futures]
    
    def process_table(table):
    
        conn = pyodbc.connect(.....)
        cursor = conn.cursor()
        
        for i in range(0, 10):   
            sql = 'update ' + table + ' set col = ' + str(i)
            cursor.execute(sql)
        
        conn.commit()
    
        cursor.close()
        conn.close()
    

    Let's delve into some useful parameters in ODBC as listed in the reference:

    Parameter Description
    SQL_MULT_RESULT_SETS A character string: "Y" if the data source supports multiple result sets, "N" if it does not. For more information about multiple result sets, see Multiple Results.
    SQL_MULTIPLE_ACTIVE_TXN A character string: "Y" if the driver supports more than one active transaction at the same time, "N" if only one transaction can be active at any time. The information returned for this information type does not apply in the case of distributed transactions.
    SQL_MAX_DRIVER_CONNECTIONS An SQLUSMALLINT value that specifies the maximum number of active connections that the driver can support for an environment. This value can reflect a limitation imposed by either the driver or the data source. If there is no specified limit or the limit is unknown, this value is set to zero.
    SQL_MAX_CONCURRENT_ACTIVITIES An SQLUSMALLINT value that specifies the maximum number of active statements that the driver can support for a connection. A statement is defined as active if it has results pending, with the term "results" meaning rows from a SELECT operation or rows affected by an INSERT, UPDATE, or DELETE operation (such as a row count), or if it is in a NEED_DATA state. This value can reflect a limitation imposed by either the driver or the data source. If there is no specified limit or the limit is unknown, this value is set to zero.
    SQL_MAX_ASYNC_CONCURRENT_STATEMENTS An SQLUINTEGER value that specifies the maximum number of active concurrent statements in asynchronous mode that the driver can support on a given connection. If there is no specific limit or the limit is unknown, this value is zero.

    You can know their values by printing them:

    print(conn.getinfo(pyodbc.SQL_MAX_DRIVER_CONNECTIONS))
    print(conn.getinfo(pyodbc.SQL_MAX_DRIVER_CONNECTIONS))
    print(conn.getinfo(pyodbc.SQL_MAX_CONCURRENT_ACTIVITIES))
    print(conn.getinfo(pyodbc.SQL_MAX_ASYNC_CONCURRENT_STATEMENTS))
    

    Additionally, there is a protocol to allow a single connection to have multiple concurrent queries running at the same time:

    SQL Server 2005 (9.x) introduced a new connection attribute that allows applications to have more than one pending request per connection, and in particular, to have more than one active default result set per connection called Multiple Active Result Sets (MARS). In addition, a new connection string keyword, Mars_Connection, as been added. It accepts "yes" or "no" values; "no" is the default. For pyodbc, you can enable it by adding MARS_Connection=yes to your connection string.

    However, the execution of multiple queries on a single connection in MARS is interleaved not concurrent. If you want two queries to actually execute at the same time, use two connections (as you do through a separate process). The driver can only process one query at a time on a connection.