The intent of the code below is to perform several database updates in parallel using pyodbc and mysql. Question: are the connections opened independently and in parallel? Will this code work or I need to take a different approach? I tried to investigate pyodbc and multithreading and didn't find much.
import multiprocessing, pyodbc
tables = ['tab1', 'tab2', 'tab3', 'tab4', 'tab5', 'tab6']
for table in tables:
p = multiprocessing.Process(target=process_table, args=(table,))
p.start()
def process_table(table):
conn = pyodbc.connect(.....)
cursor = conn.cursor()
for i in range(0, 10):
sql = 'update ' + table + ' set col = ' + str(i)
cursor.execute(sql)
conn.commit()
cursor.close()
conn.close()
Let's get to the point.
Here's an example code on how to do it with a Thread:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import pyodbc
executor = ThreadPoolExecutor() # or ProcessPoolExecutor
tables = ['tab1', 'tab2', 'tab3', 'tab4', 'tab5', 'tab6']
futures = []
for table in tables:
result = executor.submit(process_table, table)
futures.append(result)
results = [future.result() for future in futures]
def process_table(table):
conn = pyodbc.connect(.....)
cursor = conn.cursor()
for i in range(0, 10):
sql = 'update ' + table + ' set col = ' + str(i)
cursor.execute(sql)
conn.commit()
cursor.close()
conn.close()
Let's delve into some useful parameters in ODBC as listed in the reference:
Parameter | Description |
---|---|
SQL_MULT_RESULT_SETS | A character string: "Y" if the data source supports multiple result sets, "N" if it does not. For more information about multiple result sets, see Multiple Results. |
SQL_MULTIPLE_ACTIVE_TXN | A character string: "Y" if the driver supports more than one active transaction at the same time, "N" if only one transaction can be active at any time. The information returned for this information type does not apply in the case of distributed transactions. |
SQL_MAX_DRIVER_CONNECTIONS | An SQLUSMALLINT value that specifies the maximum number of active connections that the driver can support for an environment. This value can reflect a limitation imposed by either the driver or the data source. If there is no specified limit or the limit is unknown, this value is set to zero. |
SQL_MAX_CONCURRENT_ACTIVITIES | An SQLUSMALLINT value that specifies the maximum number of active statements that the driver can support for a connection. A statement is defined as active if it has results pending, with the term "results" meaning rows from a SELECT operation or rows affected by an INSERT, UPDATE, or DELETE operation (such as a row count), or if it is in a NEED_DATA state. This value can reflect a limitation imposed by either the driver or the data source. If there is no specified limit or the limit is unknown, this value is set to zero. |
SQL_MAX_ASYNC_CONCURRENT_STATEMENTS | An SQLUINTEGER value that specifies the maximum number of active concurrent statements in asynchronous mode that the driver can support on a given connection. If there is no specific limit or the limit is unknown, this value is zero. |
You can know their values by printing them:
print(conn.getinfo(pyodbc.SQL_MAX_DRIVER_CONNECTIONS))
print(conn.getinfo(pyodbc.SQL_MAX_DRIVER_CONNECTIONS))
print(conn.getinfo(pyodbc.SQL_MAX_CONCURRENT_ACTIVITIES))
print(conn.getinfo(pyodbc.SQL_MAX_ASYNC_CONCURRENT_STATEMENTS))
Additionally, there is a protocol to allow a single connection to have multiple concurrent queries running at the same time:
SQL Server 2005 (9.x) introduced a new connection attribute that allows applications to have more than one pending request per connection, and in particular, to have more than one active default result set per connection called Multiple Active Result Sets (MARS). In addition, a new connection string keyword, Mars_Connection
, as been added. It accepts "yes" or "no" values; "no" is the default. For pyodbc, you can enable it by adding MARS_Connection=yes
to your connection string.
However, the execution of multiple queries on a single connection in MARS is interleaved not concurrent. If you want two queries to actually execute at the same time, use two connections (as you do through a separate process). The driver can only process one query at a time on a connection.