Search code examples
csvexport-to-csvazure-logic-apps

Skip first and last x rows in CSV stored in SFTP using Azure Logic App


I have a logic app which uses 2 Compose, the first one skips the first 3 rows

take(skip(variables('CSV Data'),3),sub(length(variables('CSV Data')),1))

and the second one skips the last 2 rows

take(outputs('Skip_Header'),sub(length(outputs('Skip_Header')),2))

This works great when I initialise the CSV Data Variable with an array

[
  "rubbish1,rubbish2,rubbish3",
  "blank1,blank2,blank3",
  "header1,header2,header3",
  "data1,data2,data3",
  "data4,data5,data6",
  "data7,data8,data9"
]

Below is my working Logic App definition:

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "Initialize_CSV_Data": {
                "inputs": {
                    "variables": [
                        {
                            "name": "CSV Data",
                            "type": "array",
                            "value": [
                                "rubbish1,rubbish2,rubbish3",
                                "blank1,blank2,blank3",
                                "header1,header2,header3",
                                "data1,data2,data3",
                                "data4,data5,data6",
                                "data7,data8,data9"
                            ]
                        }
                    ]
                },
                "runAfter": {},
                "type": "InitializeVariable"
            },
            "Skip_Footer": {
                "inputs": "@take(outputs('Skip_Header'),sub(length(outputs('Skip_Header')),2))",
                "runAfter": {
                    "Skip_Header": [
                        "Succeeded"
                    ]
                },
                "type": "Compose"
            },
            "Skip_Header": {
                "inputs": "@take(skip(variables('CSV Data'),3),sub(length(variables('CSV Data')),1))",
                "runAfter": {
                    "Initialize_CSV_Data": [
                        "Succeeded"
                    ]
                },
                "type": "Compose"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "parameters": {},
        "triggers": {
            "manual": {
                "inputs": {},
                "kind": "Http",
                "type": "Request"
            }
        }
    },
    "parameters": {}
}

However that was just for testing, because in reality the data is actually in a CSV format stored in a SFTP, so if I change the initialise variable from array

[
  "rubbish1,rubbish2,rubbish3",
  "blank1,blank2,blank3",
  "header1,header2,header3",
  "data1,data2,data3",
  "data4,data5,data6",
  "data7,data8,data9"
]

to string

rubbish1,rubbish2,rubbish3
blank1,blank2,blank3
header1,header2,header3
data1,data2,data3
data4,data5,data6
data7,data8,data9

Then that will a truer reflection of the actual scenario I have. But what the Logic App does now is that only ignores the first 3 characters and last 2 characters instead of the first 3 rows and last 2 rows. The new definition is below:

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "Initialize_CSV_Data": {
                "inputs": {
                    "variables": [
                        {
                            "name": "CSV Data",
                            "type": "string",
                            "value": "rubbish1,rubbish2,rubbish3\nblank1,blank2,blank3\nheader1,header2,header3\ndata1,data2,data3\ndata4,data5,data6\ndata7,data8,data9"
                        }
                    ]
                },
                "runAfter": {},
                "type": "InitializeVariable"
            },
            "Skip_Footer": {
                "inputs": "@take(outputs('Skip_Header'),sub(length(outputs('Skip_Header')),2))",
                "runAfter": {
                    "Skip_Header": [
                        "Succeeded"
                    ]
                },
                "type": "Compose"
            },
            "Skip_Header": {
                "inputs": "@take(skip(variables('CSV Data'),3),sub(length(variables('CSV Data')),1))",
                "runAfter": {
                    "Initialize_CSV_Data": [
                        "Succeeded"
                    ]
                },
                "type": "Compose"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "parameters": {},
        "triggers": {
            "manual": {
                "inputs": {},
                "kind": "Http",
                "type": "Request"
            }
        }
    },
    "parameters": {}
}

Screenshot below also shows the difference between the 2 Logic Apps, everything is the same except for "Initialize CSV Data": enter image description here


Solution

  • Considering the same sample that you have provided, Im using 3 compose Connectors in order to skip the first Rows. Here is my Logic app

    enter image description here

    I'm using Split To Get Rows Connector in order to convert the csv file into Arrays. Below is the expression in it

    split(body('Get_blob_content_(V2)'),'
    
    ')
    

    The above steps creates an extra empty array object In order to remove it Im using the next step Rows Array connector. Below is the expression in it

    take(outputs('Split_To_Get_Rows'),sub(length(outputs('Split_To_Get_Rows')),1))
    

    Going further In Skip First Number of Rows Connector is used to skip the first 3 rows. Here is the expression in it

    take(skip(outputs('Rows_Array'),3),sub(length(outputs('Rows_Array')),1))
    

    Here is the Final Result:

    enter image description here

    UPDATED ANSWER


    To remove x number of rows of footer then the expression would be

    take(outputs('Skip_First_Number_Of_Rows_(Header)'),length(skip(outputs('Skip_First_Number_Of_Rows_(Header)'),0)))
    

    Consider the below to be the array :-

    [
    rubbish1,rubbish2,rubbish3
    blank1,blank2,blank3
    header1,header2,header3
    data1,data2,data3
    sample1,sample2,sample3
    data4,data5,data6
    rubbish4,rubbish5,rubbish6
    footer1,footer2,footer3
    ]
    

    then here is the result for the mentioned expression

    enter image description here

    and if you need to skip 2 rows from below then the expression would be

    take(outputs('Skip_First_Number_Of_Rows_(Header)'),length(skip(outputs('Skip_First_Number_Of_Rows_(Header)'),1)))
    

    and here will be the result

    enter image description here