python sql azure vector azure-sql-database

Error: Explicit Conversion from Data Type ntext to vector Not Allowed in Azure SQL Database

I'm working on implementing hybrid search functionality using Azure SQL Database. I am trying to execute https://github.com/Azure-Samples/azure-sql-db-vector-search/blob/main/Hybrid-Search/hybrid_search.py and encountered the following error in my Python script:

pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Explicit conversion from data type ntext to vector is not allowed. (529) (SQLExecDirectW); [22018] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)')

Following is the simple SQL Statement which is failing

INSERT INTO dbo.documents (id, content, embedding) VALUES (?, ?, CAST(? AS VECTOR(384)));

This error occurs when executing a SQL query that involves vector operations. It appears that the embeddings are being treated as ntext, and the current setup doesn't recognize the vector data type.

Environment Details:

Python version: 3.11.x
ODBC Driver: Microsoft ODBC Driver 18 for SQL Server
Azure SQL Database

Steps Taken:

Verified that the database schema defines the relevant column with the appropriate data type for storing vector embeddings.
Ensured that the Python environment is using the latest version (18) of the ODBC driver.
Confirmed that the SQL Server instance supports vector data types and operations.

Solution

enter image description here

The data you are getting is ntext data type, cast that involves converting an ntext data type to another incompatible data type, like vector . That may be the reason to get above error. To resolve error, Cast the data into NVARCHAR(MAX) after that cast it to VECTOR(384). Use below command to insert the data:

INSERT INTO dbo.documents (id, content, embedding)
VALUES (?, ?, CAST(CAST(? as NVARCHAR(MAX)) AS VECTOR(384)));

Then the code will execute successfully, without any error:

enter image description here

And the data will insert into the table successfully as shown below:

enter image description here

Here is the complete code for your reference;

import os
import pyodbc
import logging
import json
from sentence_transformers import SentenceTransformer

connection_string = (
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=<serverName>.database.windows.net;"
    "DATABASE=<db>;"
    "UID=<userName>;"
    "PWD=<password>"
)

if __name__ == '__main__':
    print('Initializing sample...')
    print('Getting embeddings...')
    sentences = [
        'The dog is barking',
        'The cat is purring',
        'The bear is growling'
    ]
    model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')  # returns a 384-dimensional vector
    embeddings = model.encode(sentences)

    # Establishing the connection using the hardcoded connection string
    conn = pyodbc.connect(connection_string)

    print('Cleaning up the database...')
    try:
        cursor = conn.cursor()
        cursor.execute("DELETE FROM dbo.documents;")
        cursor.commit()
    finally:
        cursor.close()

    print('Saving documents and embeddings in the database...')
    try:
        cursor = conn.cursor()

        for id, (content, embedding) in enumerate(zip(sentences, embeddings)):
            cursor.execute(f"""
                INSERT INTO dbo.documents (id, content, embedding) 
                VALUES (?, ?, CAST(cast(? as NVARCHAR(MAX)) AS VECTOR(384)));
            """,
            id,
            content, 
            json.dumps(embedding.tolist())
            )

        cursor.commit()
    finally:
        cursor.close()