Search code examples
azureazure-synapse

Filter list of GetMetaData to top 30 (after titel of file)?


i have a GetMetaData function in my pipeline which gives me the childitems of the files inside a datalake 2. There are over 100 files in there. I only need a list of the top 30 (filtered by the name) files. How can i achieve that? I have no clue. I am pretty new to azure and still learnin.

The filename looks like this:

"productx_20220301T000000.csv" "productx_20220302T000000.csv" "productx_20220303T000000.csv" ... "productx_20230329T000000.csv"

I want the top 30 files after name (date). e.g. "productx_20230329T000000.csv" "productx_20230328T000000.csv" "productx_20230327T000000.csv" ...

If you have an idea i would appreciate a short explanation in the best case with a picture :)

thanks everyone for the help.


Solution

  • If you have files on every day(as per date in the file name) without missing any days in between the last 30 days, then you can try the below approach.

    First I have generated last n dates array in yyyyMMdd format using until activity. Then I have used filter activity on Get Meta data child items array. In this filter, it checks whether the dates array contains the extracted date from file name of child item or not.

    These are my sample input files:

    enter image description here

    Here, I have filtered for latest 4 days i.e, 20230330,20230329,20230328,20230327 files. This is my pipeline flow:

    enter image description here

    dates array in yyyyMMdd format for 4 days:

    enter image description here

    This is my pipeline JSON for your reference:

    {
        "name": "pipeline1",
        "properties": {
            "activities": [
                {
                    "name": "Get Metadata1",
                    "type": "GetMetadata",
                    "dependsOn": [
                        {
                            "activity": "Generates dates array until n days",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "source_csvfiles",
                            "type": "DatasetReference"
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    }
                },
                {
                    "name": "intialize n-1 for n days",
                    "type": "SetVariable",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "daysiterator",
                        "value": {
                            "value": "3",
                            "type": "Expression"
                        }
                    }
                },
                {
                    "name": "Generates dates array until n days",
                    "type": "Until",
                    "dependsOn": [
                        {
                            "activity": "intialize n-1 for n days",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "expression": {
                            "value": "@equals(variables('daysiterator'), '-1')",
                            "type": "Expression"
                        },
                        "activities": [
                            {
                                "name": "Append to days array",
                                "type": "AppendVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "lastndays",
                                    "value": {
                                        "value": "@formatDateTime(subtractFromTime(utcnow(), int(variables('daysiterator')), 'Day'),'yyyyMMdd')",
                                        "type": "Expression"
                                    }
                                }
                            },
                            {
                                "name": "decrement and store in temp",
                                "type": "SetVariable",
                                "dependsOn": [
                                    {
                                        "activity": "Append to days array",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "temp",
                                    "value": {
                                        "value": "@string(add(int(variables('daysiterator')), -1))",
                                        "type": "Expression"
                                    }
                                }
                            },
                            {
                                "name": "reassign temp to daysterator",
                                "type": "SetVariable",
                                "dependsOn": [
                                    {
                                        "activity": "decrement and store in temp",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "daysiterator",
                                    "value": {
                                        "value": "@variables('temp')",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ],
                        "timeout": "0.12:00:00"
                    }
                },
                {
                    "name": "Filter1",
                    "type": "Filter",
                    "dependsOn": [
                        {
                            "activity": "Get Metadata1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('Get Metadata1').output.childItems",
                            "type": "Expression"
                        },
                        "condition": {
                            "value": "@contains(variables('lastndays'),substring(item().name, add(indexof(item().name,'_'),1), 8))",
                            "type": "Expression"
                        }
                    }
                }
            ],
            "variables": {
                "lastndays": {
                    "type": "Array"
                },
                "daysiterator": {
                    "type": "String"
                },
                "temp": {
                    "type": "String"
                }
            },
            "annotations": []
        }
    }
    

    Filter activity result array:

    enter image description here

    If your files missing on some days, then try the below approach.

    • In this case Get the child items JSON array and Give it to a Foreach. Inside Foreach use append variable activity to an array to get only dates (int(substring(item().name, add(indexof(item().name,'_'),1), 8))).
    • Here we need to sort this array in Descending order. For sorting of array, you can follow this approach by @AnnuKumari-MSFT.
    • After getting sorted array of Dates, use same filter activity. Give child items of Get Meta data child items to items of filter and give the condition like this. @contains(take(variables('Sorted dates array'),30),int(substring(item().name, add(indexof(item().name,'_'),1), 8))).