I'm trying to upload a decent amount (~300k rows) worth of data to a database using pyodbc. Since the data will need to be updated quarterly (from spreadsheets), what I'm trying to create is a sort of dynamic insert statement to make things a bit more streamlined. My thought process is that I can name the header of each column in the spreadsheets the same as the column in the table where I want the respective data to be uploaded.
What I've attempted to do is write a script that pulls out the column names in the header row on the worksheet and uses these as the variables, eliminating the need for me to change any code if I'm uploading different data from different excel sheets back to back:
import xlrd
import numpy as np
import pyodbc
wb=xlrd.open_workbook(r"FILE")
worksheet = wb.sheet_by_name('SHEET_NAME')
num_rows = worksheet.nrows
num_cols = worksheet.ncols
header_row = 0
header_list = [worksheet.row_values(header_row)]
header_values = ", ".join([str(i) for i in [cell.value for cell in worksheet.row(header_row)]])
question_marks = ",?" * (num_cols - 1)
cell=worksheet.cell(1,1)
arr = []
for rowind in range(num_rows)[(header_row + 1):]:
arr.append([cell.value for cell in worksheet.row(rowind)])
data = np.rec.fromrecords(arr, names=header_values)
cnxn = pyodbc.connect(r"DRIVER={SQL Server};SERVER=XXXXXXXXXX
\DEV_CON1;DATABASE=GGG;UID=AAA_dbo;PWD=XXXXXXXXX;")
cursor = cnxn.cursor()
populate_db = "insert into tblSnap_TEST(" + header_values + ") values (?" + question_marks + ")"
for i in header_list:
i = data[i]
values = header_list
cursor.execute(populate_db,values)
cursor.close
cnxn.commit
cnxn.close`
When I attempt to run the script I get the following error message:
File "<ipython-input-45-6809dc4a27ac>", line 1, in <module>
runfile('H:/My Documents/Python/string_search_test.py', wdir='H:/My Documents/Python')
File "C:\Users\xxxx\xxx\xx\x\Anaconda\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "H:/My Documents/Python/string_search_test.py", line 67, in <module>
cursor.execute(populate_db,values)
ProgrammingError: ('The SQL contains 21 parameter markers, but 1 parameters were supplied', 'HY000')
The way I've done this before is by explicitly defining the values to be passed, such as:
account_numbers = (sheet.cell(row_index, 1).value)
But like I said, what I'm trying to do here is make it so I wouldn't have to type that out. That is what I was attempting to do with i = data[i]. I'm hoping there's a way for Python to recognize that "account_numbers" is in a list (created from the worksheet's headers), then grab the corresponding data based on something similar to i = data[i]
which I tried above. Is a solution like this possible? data[i] does return the right data I want to insert into the table for each column, but it's not recognized by the execute statement.
If you're not dealing with large Excel worksheets, or any problematic data types (such as one described in the section Dates in Excel spreadsheets), you can simplify reading all rows into a single list, pop the header values for the insert columns, then call Cursor.executemany once to insert all values from the spreadsheet, passing a sequence of sequences for parameter values.
I've removed the numpy array population, since it's not necessary in the snippet provided.
header_row = 0
# build list of lists that represents row values in worksheet,
# including column names from header row
rows = [worksheet.row_values(row) for row in range(worksheet.nrows)]
# extract list of column names to use for insert statement, values to be inserted remain
columns = rows.pop(header_row)
cnxn = pyodbc.connect(r"DRIVER={SQL Server};SERVER=XXXXXXXXXX\DEV_CON1;DATABASE=GGG;UID=AAA_dbo;PWD=XXXXXXXXX;")
cursor = cnxn.cursor()
# list of column names used to build SQL statement, including parameter placeholders (?)
populate_db = "insert into tblSnap_TEST ({}) values ({})".format(', '.join(columns),
', '.join('?' * len(columns)))
# insert is executed once for each sequence of parameter values
cursor.executemany(populate_db, rows)
cnxn.commit()
cnxn.close()