Search code examples
azureazure-cognitive-searchazure-ai-search

How to use Python to perform vector search or hybrid search on Azure AI Search?


As title

My setup is as follows: I select "Import and vectorize data" on the Azure AI Search Portal and I get an index with vector values. I am used to using python for Azure AI Search.

Python code is as follow;

credential = AzureKeyCredential(key)
search_client = SearchClient(
    endpoint=endpoint,
    index_name=index_name,
    credential=credential
)

text=input("Qes:")
results=search_client.search(search_text=text,select="title")

for ans in results:
    print(ans)

How do I perform a vector search or hybrid search in python under this situation?


Solution

  • Posting my comments as an answer is a benefit for the community.

    You can check this Github link with the steps below to perform a Vector search:

    1. Generate Embeddings: Start by reading your data and generating embeddings using OpenAI. Once generated, export these embeddings into a format suitable for insertion into your Azure AI Search index.

    2. Set Up Search Index: Create the schema for your search index and configure vector search settings according to your requirements.

    3. Add Text and Embeddings to Index: Populate your vector store with the text data and corresponding metadata from your JSON dataset.

    4. Conduct Vector Similarity Search: Utilize the provided code to perform a vector similarity search. Simply provide the text query, and the vectorizer will handle the vectorization of the query automatically.

    from azure.search.documents.models import VectorizedQuery
    
    query = "tools for software development"  
      
    embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
    vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")
      
    results = search_client.search(  
        search_text=None,  
        vector_queries= [vector_query],
        select=["title", "content", "category"],
    )  
      
    for result in results:  
        print(f"Title: {result['title']}")  
        print(f"Score: {result['@search.score']}")  
        print(f"Content: {result['content']}")  
        print(f"Category: {result['category']}\n") 
    

    Below is the code for Hybrid Search:

    query = "scalable storage solution"  
      
    embedding = client.embeddings.create(input=query, model=embedding_model_name).data[0].embedding
    vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="contentVector")
    
    results = search_client.search(  
        search_text=query,  
        vector_queries=[vector_query],
        select=["title", "content", "category"],
        top=3
    )  
      
    for result in results:  
        print(f"Title: {result['title']}")  
        print(f"Score: {result['@search.score']}")  
        print(f"Content: {result['content']}")  
        print(f"Category: {result['category']}\n")