I'm trying to connect to many Pervasive databases using pyodbc and then perform a sql query on each database. The current problem Im facing is that this script takes too long to run because its trying to connect and run a sql query 1 at a time. I thought multi threading might be a good solution for this. Im very new to multi threading and was wondering what the best approach for something like this would be?
Any tips would be GREATLY appreciated, thanks for looking.
My connection looks something like this:
import pyodbc
import pandas as pd
import logging
server = '1.1.1.1:111'
database = 'ABC'
username = 'test'
password = 'test123'
list_of_databases = list(large_list)
sql = "SELECT * FROM Table where Date between '20230226' and '20230227'
my loop looks like this where I get all of the connections and establish a connection w/ all databases.
def connect_to_pervasive(databases, server):
try:
logger.info("connecting to Pervasive Server")
connect_string = 'DRIVER=Pervasive ODBC Interface;SERVERNAME={server_address};DBQ={db}'
connections = [pyodbc.connect(connect_string.format(db=n, server_address=server)) for n in databases]
cursors = [conn.cursor() for conn in connections]
logger.info("Connection established!")
except Exception as e:
logger.critical(f"Error: {e}")
return cursors
Here is where I think I should do my multi threading. Right now this is opening MANY connections, not closing connections after they are run and going 1 at a time. Ideally id want multiple connections happening. Any tips would be GREATLY appreciated, thanks for looking.
data = []
try:
for cur in connections:
rows = cur.execute(sql).fetchall()
df = pd.DataFrame.from_records(rows, columns=[col[0] for col in cur.description])
data.append(df)
except Exception as e:
print(e)
logger.critical(f'Error: {e}')
finally:
for cur in connections:
cur.close()
I am not too clear on exactly what you want to do since the code you have doesn't quite match up with your comments. But if you are just doing one database SELECT per connection, then I would think that something like the following will do what you want. If you want something else, I am sure will get back to me ...
from conucrrent.futures import ThreadPoolExecutor
...
# Rest of code as before except new definition of
# connect_to_pervasive:
# Presumably these values are for every database:
server = '1.1.1.1:111'
sql = "SELECT * FROM Table where Date between '20230226' and '20230227'"
def connect_to_pervasive(database):
""" Connect to a single database. """
try:
logger.info("connecting to Pervasive Server")
connect_string = 'DRIVER=Pervasive ODBC Interface;SERVERNAME={server_address};DBQ={db}'
connection = pyodbc.connect(connect_string.format(db=database, server_address=server))
logger.info("Connection established!")
return connection
except Exception as e:
logger.critical(f"Error: {e}")
raise # re-raise the exception
def worker(database):
""" This creates connection, cursor, retrieves rows, closes connection, etc. """
connection = None
cursor = None
try:
# Same server for every database?
# If so and server is defined as '1.1.1.1:111', then there is no real
# need to pass server as an argument:
connection = connect_to_pervasiveness(database)
cursor = connection.cursor()
rows = cursor.execute(sql).fetchall()
return pd.DataFrame.from_records(rows, columns=[col[0] for col in cursor.description])
except Exception as e:
print(e)
logger.critical(f'Error: {e}')
return None
finally:
if connection:
if cursor:
cursor.close()
connection.close()
with ThreadPoolExecutor(len(databases)) as executor:
# If the worker gets an exception, then the corresponding list element
# will be None.
data = list(executor.map(worker, databases))