Search code examples
azureazure-data-factoryazure-data-lakeu-sql

Schedule U-SQL jobs in Azure Data Factory


I've got a following issue. I'd like to schedule three U-SQL jobs in following timing: 02:00UTC, 03:00UTC and 04:00UTC everyday. I know that by default, jobs in the pipeline are executed at 12:00AM UTC hence all my jobs run at the same time which is not what I want.

I red the documentation and it is written that I should consider offset parameter in dataset template. However when I try to set this the following error occurs: error.

I do not knot how to set different than 12:00AM runtime of U-SQL job. Can You provide me some info on how to do that properly? In addition I attach my template of a dataset and a pipeline:
Dataset

{
"name": "TransformedData2",
"properties": {
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "ADLstore_linkedService_scrapper",
    "typeProperties": {
        "fileName": "TestOutput2.csv",
        "folderPath": "transformedData/",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": ","
        }
    },
    "availability": {
        "frequency": "Day",
        "interval": 1,
        "style": "StartOfInterval"
    }
}

}

Pipeline

{
"name": "filtering",
"properties": {
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "usqljobs\\cleanStatements.txt",
                "scriptLinkedService": "AzureStorageLinkedService",
                "degreeOfParallelism": 5,
                "priority": 100,
                "parameters": {}
            },
            "outputs": [
                {
                    "name": "TransformedData2"
                }
            ],
            "scheduler": {
                "frequency": "Day",
                "interval": 1,
                "style": "StartOfInterval"
            },
            "name": "Brajan filtering",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2017-07-02T09:50:00Z",
    "end": "2018-06-30T03:00:00Z",
    "isPaused": false,
    "hubName": "datafactoryfin_hub",
    "pipelineMode": "Scheduled"
}

}

Thanks


Solution

  • Using the Offset attribute can get a little messy as you'll need to re-provision time slices at the dataset level.

    As an alternative I would suggest using the Delay attribute at for the activity. This gives more control and does not require time slices to be re-provisioned.

    So in your JSON...

    {
    "name": "filtering",
    "properties": {
        "activities": [
            {
                "type": "DataLakeAnalyticsU-SQL",
                "typeProperties": {
                    "scriptPath": "usqljobs\\cleanStatements.txt",
                    "scriptLinkedService": "AzureStorageLinkedService",
                    "degreeOfParallelism": 5,
                    "priority": 100,
                    "parameters": {}
                },
                "outputs": [
                    {
                        "name": "TransformedData2"
                    }
                ],
                "policy": {
                  "delay": "02:00:00" // <<<<< 2:00am start
                }, 
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1,
                    "style": "StartOfInterval"
                },
                "name": "Brajan filtering",
                "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
            }
        ],
        "start": "2017-07-02T09:50:00Z",
        "end": "2018-06-30T03:00:00Z",
        "isPaused": false,
        "hubName": "datafactoryfin_hub",
        "pipelineMode": "Scheduled"
    }
    

    Then you'll of course need additional activities for the 3:00am and 4:00am versions.

    Check out this link for more info:

    https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution

    Delay is mentioned about a quarter of the way down the page.

    Hope this helps