Search code examples
azureazure-synapseazure-data-factory

Azure Data Factory - handling lookup activity limitation of 4194304 in size


I have a azure data factory pipeline with a lookup activity that check a JSON file.

The size is like below in azure: Azure Blog Size Screenshot

and when I download it, I see below values for the file. so it's not larger that the value the error states: "The size 5012186 of lookup activity result exceeds the limitation 4194304"

Size of the data as opened in Notepad ++

Also below is the design of my pipeline that gets stuck: Pipeline design - Lookup Activity to Read my model.json file to retrieve metadata

Any ideas on how to tackle this issue? thanks in advance


Solution

  • As lookup has the limitation of 5000 rows, you can try the below workaround for this.

    To overcome this the workaround is as mentioned in Microsoft Document

    Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.

    Possible solution:

    First, try to save your files list as JSON files to a folder of Blob storage with the size of 5000 rows.

    create Get Metadata activity which can fetch the files from the folder

    Get Metadata activity settings enter image description here

    Then create For-each activity to Iterate over files

    In for-each activity setting give items as @activity('Get Metadata1').output.childItems enter image description here

    For the files create a dataset and give the folder name manually and for filename use the dataset parameter, which we can give the filename in the lookup inside the parent ForEach.

    enter image description here

    Lookup activity inside Parent ForEach give the file name as @string(item().name)

    enter image description here

    Execute Pipeline activity:

    Before this create an array parameter in the child pipeline and pass the look up output inside ForEach to that in the Execute Pipeline activity. enter image description here

    Give look up output @activity('Lookup1').output.value

    enter image description here

    Now create inside the Child Pipeline and give the array parameter to the ForEach as @pipeline().parameters.ok enter image description here

    You can use which ever activity you want inside this ForEach, here I have used append.
    enter image description here

    Then create result1 variable as array and give value as @variables('arrayid') enter image description here

    The Output will be the array of all ids in the file

    enter image description hereBgULa02.png)