I'm working on implementing hybrid search functionality using Azure SQL Database. I am trying to execute https://github.com/Azure-Samples/azure-sql-db-vector-search/blob/main/Hybrid-Search/hybrid_search.py and encountered the following error in my Python script:
pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Explicit conversion from data type ntext to vector is not allowed. (529) (SQLExecDirectW); [22018] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)')
Following is the simple SQL Statement which is failing
INSERT INTO dbo.documents (id, content, embedding) VALUES (?, ?, CAST(? AS VECTOR(384)));
This error occurs when executing a SQL query that involves vector operations. It appears that the embeddings are being treated as ntext, and the current setup doesn't recognize the vector data type.
Environment Details:
Steps Taken:
The data you are getting is ntext data type, cast that involves converting an ntext
data type to another incompatible data type, like vector
. That may be the reason to get above error. To resolve error, Cast the data into NVARCHAR(MAX) after that cast it to VECTOR(384). Use below command to insert the data:
INSERT INTO dbo.documents (id, content, embedding)
VALUES (?, ?, CAST(CAST(? as NVARCHAR(MAX)) AS VECTOR(384)));
Then the code will execute successfully, without any error:
And the data will insert into the table successfully as shown below:
Here is the complete code for your reference;
import os
import pyodbc
import logging
import json
from sentence_transformers import SentenceTransformer
connection_string = (
"DRIVER={ODBC Driver 17 for SQL Server};"
"SERVER=<serverName>.database.windows.net;"
"DATABASE=<db>;"
"UID=<userName>;"
"PWD=<password>"
)
if __name__ == '__main__':
print('Initializing sample...')
print('Getting embeddings...')
sentences = [
'The dog is barking',
'The cat is purring',
'The bear is growling'
]
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1') # returns a 384-dimensional vector
embeddings = model.encode(sentences)
# Establishing the connection using the hardcoded connection string
conn = pyodbc.connect(connection_string)
print('Cleaning up the database...')
try:
cursor = conn.cursor()
cursor.execute("DELETE FROM dbo.documents;")
cursor.commit()
finally:
cursor.close()
print('Saving documents and embeddings in the database...')
try:
cursor = conn.cursor()
for id, (content, embedding) in enumerate(zip(sentences, embeddings)):
cursor.execute(f"""
INSERT INTO dbo.documents (id, content, embedding)
VALUES (?, ?, CAST(cast(? as NVARCHAR(MAX)) AS VECTOR(384)));
""",
id,
content,
json.dumps(embedding.tolist())
)
cursor.commit()
finally:
cursor.close()