Search code examples
azurereal-timeazure-ai-search

Are Azure AI Search Indexers useful for real-time embedding?


We have a chatbot, in which users can upload files. These files are included into a RAG in real time. Currently we use a azure container app to extract text out of the uploaded files and the text is embedded and saved into an azure AI Search Index. Our goal is to get rid of the text extraction process in the container app. My plan is to set up Azure Cognitive Search Indexer that checks the azure blob storage for changes (or is triggered by us for each change), extracts the text via a skill and indexes/embedds every new element.

What i want to know is, is this approach feasible for real time usage. Meaning, when a user uploads a file to the blob storage, can i be sure that the file embedding starts right away so the user can interact with it in a RAG? Or is the indexer only usable for no real-time application where it is ok that a file is embedded within a few hours?


Solution

  • You can schedule the indexer in a particular time interval.

    enter image description here

    OR

    if you want real time indexing the data when the file arrives to the blob storage, you can use the azure function app with blob trigger and make rest api request to run the indexer.

    Refer this documentation to create function app for blob storage trigger.

    Whenever the new files arrives this function runs and you make rest api request like below.

    import requests
    
    def run_indexer(service_name, indexer_name, api_version, admin_key):
        """
        Trigger an Azure Cognitive Search Indexer to run on demand.
    
        Parameters:
        - service_name (str): The unique name of your Azure Cognitive Search service.
        - indexer_name (str): The name of the indexer to run.
        - api_version (str): The API version to use (e.g., '2020-06-30').
        - admin_key (str): The admin key for your Azure Cognitive Search service.
    
        Returns:
        - Response object from the API request.
        """
        
        url = f"https://{service_name}.search.windows.net/indexers/{indexer_name}/run?api-version={api_version}"
    
    
        headers = {
            "Content-Type": "application/json",
            "api-key": admin_key
        }
    
        try:
            response = requests.post(url, headers=headers)
    
    
            if response.status_code == 202:
                print("Indexer run successfully triggered.")
            else:
                print(f"Failed to trigger indexer. Status Code: {response.status_code}, Response: {response.text}")
    
            return response
    
        except requests.RequestException as e:
            print(f"An error occurred: {e}")
            return None
    
    
    if __name__ == "__main__":
        SERVICE_NAME = "your-service-name"  # Replace with your service name
        INDEXER_NAME = "your-indexer-name"  # Replace with your indexer name
        API_VERSION = "2020-06-30"  # Use the appropriate API version
        ADMIN_KEY = "your-admin-key"  # Replace with your admin key
    
        run_indexer(SERVICE_NAME, INDEXER_NAME, API_VERSION, ADMIN_KEY)
    
    

    You alter above code and add it inside the function app accordingly.