Search code examples
azureindexingopenai-apiazure-openaiazure-ai-search

Azure AI Search MultiIndex / Conditional Semantic search


I'm working with Azure OpenAI to create a chatbot for retrieving data from massive enterprise textual data. At my editorial company, I'm facing a challenge with Azure AI Search. Initially, all the data was in a single index, but now I need to separate it into three different indices due to conditional search requirements. Here are the details:

  • Index 1: Biology Index (private, FR)
  • Index 2: Engineering and Technology Index (EN)
  • Index 3: Art and Architecture Index (USA, UK)

These indices contain various data sources and publications, and there is overlap in topics across them. For example, when querying about anatomy-related topics like eyesight, cardiovascular diseases, or growth hormone therapy, I want these queries, and related biological topics, to exclusively retrieve data from the Biology Index (Index 2).

My Python code effectively retrieves accurate data (with a one single index), but I'm looking for a solution within Azure AI Search to prioritize specific indices based on query context.

For example:

  • Queries related to biology should exclusively retrieve data from indices 1 and 2.

  • Queries related to technology, data science, and AI should exclusively retrieve data from index 2.

I haven't come across a service or GitHub repository that directly addresses this specific requirement. I know that Azure does not allow multi-index search.

How can I find a solution or workaround?

This is the code I use to RAG

index_name = 'indx-editorials-bio-fr-old'

# Query to execute
query = 'Please retrieve publications from editorial certified houses covering cardiovascular diseases'

# Function to execute the query with semantic ranking
def execute_query_with_semantic_ranking():
    try:
        # Create a SearchClient for the index
        credential = AzureKeyCredential(admin_key)
        client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)
        
        # Execute the query with semantic ranking
        results = client.search(search_text=query, semantic_fields=["content", "title"])
        
        # Print the results
        print(f"Results from index '{index_name}' with semantic ranking:")
        for result in results:
            print(result)
        print()
    
    except Exception as e:
        print(f"Error querying index '{index_name}' with semantic ranking: {e}")

# Execute the query with semantic ranking
execute_query_with_semantic_ranking()

Index Definition:

{
  "@odata.context": "search.windows.net",
  "@odata.etag": "\"123547858WRF\"",
  "name": "all_articles_index",
  "defaultScoringProfile": null,
  "fields": [
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": true,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    },
    {
      "name": "doi",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    },
    {
      "name": "editorial_house",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    },
    {
      "name": "metadata_storage_path",
      "type": "Edm.String",
      "searchable": false,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": true,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [],
  "corsOptions": null,
  "suggesters": [],
  "analyzers": [],
  "normalizers": [],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "encryptionKey": null,
  "similarity": {
    "@odata.type": "BM25Similarity",
    "k1": null,
    "b": null
  },
  "semantic": {
    "defaultConfiguration": null,
    "configurations": [
      {
        "name": "article-semantic",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "title"
          },
          "prioritizedContentFields": [
            {
              "fieldName": "content"
            }
          ],
          "prioritizedKeywordsFields": []
        }
      }
    ]
  },
  "vectorSearch": null
}

Sample Data

[
  {
    "content": "This article explores the potential of AI to revolutionize genomics, highlighting recent breakthroughs and future prospects.",
    "title": "The Impact of AI on Genomics: Recent Breakthroughs and Future Prospects",
    "doi": "10.1234/ai-bio-2024-001",
    "editorial_house": "BioTech Publishers",
    "metadata_storage_path": "/articles/2024/ai-bio-2024-001"
  },
  {
    "content": "In this study, we discuss the integration of machine learning in drug discovery processes, focusing on its benefits and challenges.",
    "title": "Machine Learning in Drug Discovery: Benefits and Challenges",
    "doi": "10.1234/ai-bio-2024-002",
    "editorial_house": "BioTech Publishers",
    "metadata_storage_path": "/articles/2024/ai-bio-2024-002"
  },
  {
    "content": "This paper examines the role of AI in ecological monitoring, presenting case studies on wildlife conservation efforts.",
    "title": "AI in Ecological Monitoring: Wildlife Conservation Case Studies",
    "doi": "10.1234/ai-bio-2024-003",
    "editorial_house": "BioTech Publishers",
    "metadata_storage_path": "/articles/2024/ai-bio-2024-003"
  },
  {
    "content": "The article reviews advances in bioinformatics driven by AI, with a focus on data analysis techniques and their applications.",
    "title": "Advances in Bioinformatics: AI-Driven Data Analysis Techniques",
    "doi": "10.1234/ai-bio-2024-004",
    "editorial_house": "BioTech Publishers",
    "metadata_storage_path": "/articles/2024/ai-bio-2024-004"
  },
  {
    "content": "This study highlights the use of AI in personalized medicine, detailing the technology's impact on treatment plans and patient outcomes.",
    "title": "Personalized Medicine: AI's Role in Tailoring Treatment Plans",
    "doi": "10.1234/ai-bio-2024-005",
    "editorial_house": "BioTech Publishers",
    "metadata_storage_path": "/articles/2024/ai-bio-2024-005"
  }
]

Solution

  • Yes as you said multi index query is not possible. And for your problem below is the possible approach you can follow.

    You said you are creating 3 new index, along with that you also need to have 4th index with all of your content, topic and index name as fields.

    Sample data

    {
    "index_name":"Biology Index",
    "content":"All of your content having the topic about biology"
    },
    {
    "index_name":"Engineering and Technology Index",
    "content":"All of your content having the topic about Engineering and Technology"
    },
    {
    "index_name":"Art and Architecture Index",
    "content":"All of your content having the topic about Art and Architecture Index"
    }
    

    So, create a 4th index with above kind of sample data, if you have more than 1 documents for each of the topic then combine them and add it in the content field.

    Next, do query with the input on this 4th index and get the index name from the results which is having highest search.score in the results and use that in your python code for further querying.