Search code examples
pythonazureazure-cognitive-servicesazure-durable-functions

How do I pass the input of a blob triggered starter function to activity functions within the orchestrator for Azure Durable Functions?


As the title says I have a starter function that is blob triggered and I'm looking to send the blob that triggered the starter through to the activity functions as an input. I have the option to pass in an input using the .start_new() method however these inputs must be JSON serializable which the blob file (func.InputStream) is not.

I have tried to decode the InputStream object (myblob) which allows me to pass it as an input but I lose some of the basic functionality I would have when it's an InputStream object and also need to ensure I'm able to pass it into the subsquent cognitive services that will be called in the activity functions. See below for code of the starter function.

Starter Function

functions.json `

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "myblob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "container/{name}",
      "connection": "conn str"
    },
    {
      "name": "$return",
      "type": "blob",
      "path": "container/{name}",
      "connection": "AzureWebJobsStorage",
      "direction": "out"
    },
    {
      "name": "starter",
      "type": "durableClient",
      "direction": "in"
    }
  ]
}

` init.py

async def main(myblob: func.InputStream, starter: str) -> func.InputStream:
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes\n\n")        
    client = df.DurableOrchestrationClient(starter)
    instance_id = await client.start_new('Orchestrator', client_input=myblob) ##Client_input must be json serializable
    logging.info(f"Started orchestration with ID = '{instance_id}'.")
    return myblob

Which Returns: Exception: TypeError: class <class 'azure.functions.blob.InputStream'> does not expose a to_json function


Solution

  • Had more issues with using the blobtrigger so switched to an event grid trigger instead. Would pass the name of the blob which triggered the orchestration to the activity function (retrieved from the event data) and used the azure sdk to download the file to memory instead which worked quite well.

    Reasons for moving away from a blob trigger:

    1. High latency, standard blob trigger is not actually event driven and relies on polling
    2. For similar reasons logs are not guaranteed
    3. When dealing with large files (our use case required an upper limit of 2.5gb) a blob trigger can be compute intensive and at times the trigger will fail due to the size/length of the blob

    Some of this information is presented in the documentation linked below:

    https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?pivots=programming-language-python&tabs=python-v2%2Cin-process