Search code examples
pythonazurepaginationazure-sdk

Handle pagination in Python when interracting with Azure Graph API


I am getting all the resource groups tags in my tenant using an Azure Graph query which works perfectly using the Azure graph explorer from the portal.

Here is the query:

resourcecontainers
| where type == 'microsoft.resources/subscriptions/resourcegroups'
| extend dates=format_datetime(now(), "yyyy-MM-dd")
| join kind=leftouter (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project SubscriptionName=name, subscriptionId)
    on subscriptionId
| project SubscriptionName, subscriptionId, resourceGroup, 
    financial_contact=tags.financial_contact, security_contact=tags.security_contact
     

I am getting all the results in the portal (more than 2000 resource groups).

When I tried to do the same using my Python script, I got a page limit of 530 resources. Here is my script:

from azure.identity import DefaultAzureCredential
from azure.mgmt.resourcegraph import ResourceGraphClient
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.resourcegraph.models import *
import json

# Initialize Azure credentials
credentials = DefaultAzureCredential()

# Initialize Resource Graph client
resource_graph_client = ResourceGraphClient(credentials)
skip = 0
result = []


query_code = f"""
resourcecontainers
| where type == 'microsoft.resources/subscriptions/resourcegroups'
| extend dates=format_datetime(now(), "yyyy-MM-dd")
| join kind=leftouter (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project SubscriptionName=name, subscriptionId)
    on subscriptionId
| project SubscriptionName, subscriptionId, resourceGroup, 
   financial_contact=tags.financial_contact, security_contact=tags.security_contact,
    environment=tags.environment,
    version=tags.version, dates, type, location, id_prefix=id
"""


query = QueryRequest(
            query= query_code 
)
query_response = resource_graph_client.resources(query)
query_response_str = str(query_response)
json_data = json.dumps(query_response_str)

json_data = json.loads(json_data)



output_file = "resource_groups_tags.txt"
with open(output_file, "w") as f:
    json.dump(json_data, f, indent=4)

Here is the first part of the response:

{'additional_properties': {}, 'total_records': 530, 'count': 530, 'result_truncated': 'false', 'skip_token': None, 'data': [{'SubscriptionName': '

I really don't find how to handle pagination to get all the results as there is no skip/offset into the query. In Microsoft documentation they talk about the 'skip_token', but I did not find it really clear, in the response it is set to None.

Can someone help with this ?

I tried skip, limit... but the skip did not work with the limit so I don't see how to handle it.


Solution

  • I found the solution, I don't know why the result limit was to 530, it changed to 1000 and I am getting the skip_token value in the response.

    Here is the code I use:

    from azure.identity import DefaultAzureCredential
    from azure.mgmt.resourcegraph import ResourceGraphClient
    from azure.mgmt.resource import ResourceManagementClient
    from azure.mgmt.resourcegraph.models import *
    import json
    
    def get_tags(tenant: str):
        # Initialize Azure credentials
        credentials = DefaultAzureCredential()
    
        # Initialize Resource Graph client
        resource_graph_client = ResourceGraphClient(credentials)
        results = []
    
    
        query_code = f"""
        resourcecontainers
        | where type == 'microsoft.resources/subscriptions/resourcegroups'
        | extend dates=format_datetime(now(), "yyyy-MM-dd")
        | join kind=leftouter (
            resourcecontainers
            | where type == 'microsoft.resources/subscriptions'
            | project SubscriptionName=name, subscriptionId)
            on subscriptionId
        | project SubscriptionName, subscriptionId, resourceGroup, 
           financial_contact=tags.financial_contact, security_contact=tags.security_contact,
           environment=tags.environment,
            version=tags.version, dates, type, location, id_prefix=id
        """
        
    
        skip_Token = None
        n = 0
    
        while True:
    
            query = QueryRequest(
                    query = query_code,
                    options = QueryRequestOptions(
                        skip_token= skip_Token
                    )
                )
            query_response = resource_graph_client.resources(query)
    
            for tags in query_response.data:
                tags_params = {
                    'environment': tags.get('environment'),
                    'security_contact': tags.get('security_contact'),
                    'subscription': tags.get('SubscriptionName'),
                    'subscription_id': tags.get('subscriptionId'),
                    'resource_group': tags.get('resourceGroup'),
                    'tenant': tenant
                }
                
                results.append(tags_params)
            n +=1
            skip_Token = query_response.skip_token
    
            if not skip_Token:
                break
        print(n)
    
        return results