Search code examples
azureazure-openai

How to add Data Lake Gen 2 ACL information to Azure AI search index (AI?


I have deployed the chat interface for Azure OpenAI on your data and have the indexes and indexers automatically created via Azure OpenAI Studio wizard including vector search. My document data source is an Azure Storage Data Lake Gen 2. Now I want to limit access to documents for specific Entra user groups assigned to blobs via ACL. I have read all documentation I could find on that including https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure/ba-p/3956408 and the linked sample scripts.

How can I extend the indexes and indexer generated by the AOAI Studio wizard to include ACL information from Data Lake Gen 2? From my current standpoint it looks like a simple field import missing from blob to the index group_ids field.

I would prefer to not do the document preprocessing myself if possible because it is implemented out of the box by Microsoft already.


Solution

  • I dont think there is direct out-of-the-box solution for integrating Azure Data Lake Storage Gen2 (ADLS Gen2) ACL, Ill suggest a work around hope it helps you firstly, get ACL details using Azure Storage SDKs or REST APIs. Tthat give you the lowdown on permissions for each blob.

    and now, the tricky part – making a custom process. You'll want to map these ACL details to your Entra user groups and structure the data so that Azure AI Search can make sense of it. i guess you understand this.

    Once its done use your Azure AI Search index. Use the Azure AI Search Indexer API or SDKs to update your existing index with the new ACL-injected data. also make sure you've added the necessary ACL-related fields to your index schema.

    Ill sharwa a sample snipp below in python

    # Fetch ACL, process data, and update Azure AI Search index
    from azure.storage.blob import BlobServiceClient
    from azure.search.documents import SearchServiceClient
    from azure.core.credentials import AzureKeyCredential
    
    # Fetch ACL and process data (fill in the blanks)
    # ...
    
    # Update Azure AI Search index
    search_service_name = "your-search-service-name"
    index_name = "your-index-name"
    api_key = "your-search-service-api-key"
    
    search_client = SearchServiceClient(service_endpoint=f"https://{search_service_name}.search.windows.net", credential=AzureKeyCredential(api_key))
    index_client = search_client.get_index_client(index_name)
    
    # Update documents in the index with ACL information (customize this part)
    # ...