I have been doing some reading on Durable Functions using C# and long running tasks and am trying to think of a way they could work with a long running Graph API call (for example, against a Group's Users). I understand that paging would be returned from the Graph call (using the C# Graph SDK) and am trying to think of a way that an Orchestration could continue processing work from a NextPage request until there are no more pages.
A few ideas I had were:
Persist the nextToken from each Graph call in an external data source and when the Orchestration wakes up it would get this token and use it to call an activity function until the nextToken returns null. But I am concerned this goes against the non-deterministic principle of the Orchestration Function.
Use a Page Iterator https://learn.microsoft.com/en-us/graph/sdks/paging?tabs=csharp and call an activity function on each iteration to process each page result. (maybe using the fan out/fan in pattern to process, also keeping Graph Throttling in mind)
Get all the pages of Users first from an Activity Function and process them in a foreach loop - simple but I wonder about the in memory store as well as problems with a large result set.
I just thought I would ask and see to make sure I am thinking about this correctly. The last thing I am wondering is that this may not be possible or practical with this framework. Thank you.
For this requirement, I think you can use azure common function with consumption plan(do not need to use durable function). According to this document, we know that function with consumption plan has 1.5GB max memory. I test with get all fields of a user by request graph api, its size is less than 12KB. So 10000 users' json list should be about 120MB, it will not exceed 1.5GB max memory.
But you also need to pay attention to some points:
1. When you request the graph api, it can just response 999 records at a time. So you need to request the graph api multiple times.
2. You need to notice the timeout limit of consumption plan. If you use consumption plan for your function, the default timeout should be 5mins, you can set the timeout to maximum 10mins by modify the property functionTimeout
in host.json
(I think 10mins timeout is enough for your request).