Search code examples
sql-serverpandascsvsqlalchemypyodbc

Import .csv into local SQL server with pre-made table using Python


I'm making (or trying to) make a small script that compiles two .csvs, drops duplicates, then uploads itself to my local MS SQL server. I already have the table made, and I'd prefer it that way, as I need to be able to tweak the server and it's much, much easier doing that through SQL. I'm still very new to Python and don't know too much more about SQL but this site has been supremely helpful so far.

I have the .csv output and connecting to the database down (I think) but I can't get it to pull my .csv and input it into the database. I've tried a ton of different ways suggested here, but I think I'm too much of a novice to recognize what I'm doing, so it's difficult to be certain. I need the columns of the database to be put in the same columns in the SQL server. Column 'side' is the primary key in SQL. All other values are NOT NULL.

Depending on what I try, I'll either get no errors, but no input, or I'll get some sort of error. With the current code below, I get kicked back, stating that all of my columns are invalid. A few times I've been able to input incorrect values, or just a single line. That was with the BULK INSERT command, and if I add more lines, it stops working, so I've taken a different approach.

Please ignore the extraneous modules I've imported. Once I figure this out, and get my code working properly, I'll remove the ones I don't need.

I've tried pymssql but ran into an issue linking it properly to the database, I've tried just a simple cursor command for INSERT INTO with no luck either.

I've tried this from this question here: Import CSV file into SQL Server using Python and here Pandas - CSV to SQL database

but many of these are difficult as they include the creation of a database in Python and not SQL server. Eventually I want this compiled into a small .exe that my coworkers can use and I'd rather just be able to change the SQL server instead of having to redistribute a new app version every time there's a change in the table. The values inputted will stay the same, but calculations will change and different columns (surrounding these columns here) will come and go.

Here is a small snippet of example code, showing just the portion that doesn't work correctly.

from six.moves import urllib
import csv
import pymssql
import d6tstack

#Create Engine
params = urllib.parse.quote_plus("Driver={ODBC Driver 18 for SQL Server};"
                      "Server=SERVER;"
                      "Database=AppDev;"
                      "Trusted_Connection=yes;"
                      "TrustServerCertificate=yes"
                      )
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params) 
engine.connect() 



#Connect to SQL Database
cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};"
                      "Server=SERVER;"
                      "Database=AppDev;"
                      "Trusted_Connection=yes;"
                      "TrustServerCertificate=yes"
                      )
cursor = cnxn.cursor()

#######################################################Importing Files(WIP)################################################################################

df = pd.read_csv(r"C:PATHFullOutput.csv", index_col=False)


df.to_sql("Wolverine",engine, if_exists='append',chunksize=300,dtype={'Serial': String(22),
                                                                     } 
        )

             
cnxn.commit()


#closes connection
cnxn.close()

Here's the error I'm getting:

('42S22', "[42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207) (SQLExecDirectW); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'index'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid column name 'Serial'. (207); [42S22] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
[SQL: INSERT INTO [Wolverine] ([index], [Serial]) VALUES (?, ?), (?, ?)]
[parameters: (0, 'FPNLCB804MRK0223179054', 1, 'FPNLCB804MRK0223179054')]

In my code to create the .csv I specifically have it set to not create an index, and there is not visible index in the output .csv, so the fact that there is an invalid column titled 'Index' is an issue as well.

Please let me know where I've gone wrong, I'm sure it's more than once, haha. Thank you!


Solution

  • I've figured it out. With Pandas/SQL Alchemy, you can't get your datatypes to match up with MS SQL Server. What worked for me is deleting the table, and just running the python code, creating a new table. If I want to add columns or a primary key, I can do so after the table is created through SQL Alchemy. Works like a charm every time. Don't waste time fooling around, wondering if your double or single quotes are the issue. Just get Python to make the table and change it in SQL to your liking afterward.