Search code examples
pythonazureazure-durable-functions

Azure Durable Functions. How to make activities share a common data object that is very large?


I need roughly 30 parallel worker activities to scan through a very large data dictionary in python. This dictionary is hundreds of megabytes in size. But I do not want to make 30 different copies of the giant dictionary.

I have noticed that communicating parameters (or arguments) to a main() in the activity can only take a single argument, which is canonically of type string. The normal way of proceeding here is to convert the data object into a json string, then pass a json string to the other Activity. The activity then de-serializes the json string back into a data object in the type of python (or C# or Javascript etc).

The Azure literature is crystal clear. It is not possible to pass multiple arguments to an activity. You must bundle all arguments into a single JSON and have the activity un-bundle them upon arrival.

The problem is that I want to call something like this :

async def main( target : str ,  bigDictionary : str ) -> str : 

The bigDictionary should only have one copy in memory and be "distributed" to all 30 activities. Only the target changes between activities. But due to the above limitations, this is not possible with Azure Durable Functions. Thus I am forced to create 30 exact replicas of the entire bigDictionary. This easily overruns all memory quotas I am using.

What is the solution here? How can you cause several activities all share the same resource without copying it?


Solution

  • Is possible to store dictionary by key-value in Azure Blob Storage or Azure Table Storage. Then processing activity can then read the dictionary from this shared storage. Store and processing can run concurrently.