python sql-server pandas sqlalchemy pyqt5

Implicit conversion from data type varchar(max) to varbinary(max) is not allowed

I have a method called appendTable() which essentially takes the name of the table and the columns=data as keyword arguments. I take the keyword arguments and use that to build a DataFrame object and then I use the dataframe.to_sql() method to append the row to my database table. Shown here:

def appendTable(self, tableName, **kwargs):
        dataFrame = pd.DataFrame(data=[kwargs])
        print(dataFrame)

        with self.connection_handling():
            with threadLock:
                dataFrame.to_sql(tableName, con=self.connection.dbEngine, schema="dbo", index=False, if_exists='append')

For example I would use this method like this:

self.appendTable(tableName="Notebook", FormID=ID, CompressedNotes=notebook)

My table design is in Microsoft SQL Server and looks something like this:

NotebookID       | int            | primary auto-incrementing key
FormID           | int            | foreign key to a form table
Notes            | varchar(MAX)   | allow-nulls : True
CompressedNotes  | varbinary(MAX) | allow-nulls : True

The data I'm passing is coming from a PyQt5 TextEdit (used as a Notebook), which gives me text/images as HTML code, I then encode the data and compress it using zlib.compress() shown here:

notebook_html = self.noteBookTextEdit.toHtml()
notebookData  = zlib.compress(notebook_html.encode())

I print the datatype and the dataframe and find that it's the expected datatype that it has always been. I'm also adding to a database table / server that I've been using for years as well.

Notebook data type: <class 'bytes'>
        FormID                   CompressedNotes
0          163  b'x\x9c\x03\x00\x00\x00\x00\x01'

The SQL that gets generated looks like this:

SQL: INSERT INTO dbo.[Notebook] ([FormID], [CompressedNotes]) VALUES (?, ?)
parameters: ('163', b'x\x9c\x03\x00\x00\x00\x00\x01')

Recently though when I pass binary information for a column that is a VARBINARY(MAX) I am having this error appear:

    Could not execute cursor!
    Reason: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Implicit conversion from data type varchar(max) to varbinary(max) is not allowed. Use the CONVERT function to run this query. (257) (SQLExecDirectW); 
    [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)')
    [SQL: INSERT INTO dbo.[Notebook] ([FormID], [CompressedNotes]) VALUES (?, ?)]
    [parameters: ('163', b'x\x9c\x03\x00\x00\x00\x00\x01')]
    (Background on this error at: https://sqlalche.me/e/20/f405)

The only difference that I've made since this issue started was I run the appendTable() method through a QThread() instead of using threading.Thread() because I wanted to have access to some signals and slots. But I still use a thread lock to make sure multiple threads aren't trying to use my database engine at the same time. And I've been doing that for a very long time, but I'm unsure if the thread lock works with QThreads (I thought it did).

UPDATE:

When I use my pyodbc cursor to write the SQL statement myself instead of using pandas.DataFrame.to_sql() method to generate what looks like the same statement it all works. I'm passing the exact same variables with the same data types and it works, even without using the CONVERT method that the error explains.

cursor.execute('INSERT INTO Notebook (FormID, CompressedNotes) VALUES (?, ?)', (FormID, notebook))

Is pandas.DataFrame() converting my class <bytes> object into something else or am I just missing something? I'm using python 3.11.2 and pandas 1.5.3. Although before putting anything into this QThread() it previously worked with these versions.

Solution

As Gord Thompson's comment suggested, I changed the appendTable method to do a check for any binary fields and if we had binary fields, to pass that information to the df.to_sql() method.

So now the appendTable() method looks like this (I left in some print statements that I used to double check that the values were as I expected):

def appendTable(self, tableName, **kwargs):
        print("Checking Binary")
        binary_data = {}
        for keyword, value in kwargs.items():
            print(f"{keyword} Data Type: {type(value)}")
            if isinstance(value, bytes) or isinstance(value, bytearray):
                binary_data[keyword] = LargeBinary

        dataFrame = pd.DataFrame(data=[kwargs])
        print(dataFrame)

        with self.connection_handling():
            with threadLock:
                print(binary_data)
                if binary_data:
                    dataFrame.to_sql(tableName, con=self.connection.dbEngine, schema="dbo", index=False, if_exists='append', dtype=binary_data)
                else:
                    dataFrame.to_sql(tableName, con=self.connection.dbEngine, schema="dbo", index=False, if_exists='append')

Here's the print statements output:

Checking Binary

FormId Data Type: <class 'int'>

CompressedNotes Data Type: <class 'bytes'>

         FormId                                    CompressedNotes
0           168  b'x\x9cUP\xb1N\xc30\x14\xdc\xf9\x8a\x87\x97.8\...

{'CompressedNotes': <class 'sqlalchemy.sql.sqltypes.LargeBinary'>}

and there was no error from writing to the table and I was able to load the data back out as expected.