I've got a following issue. I'd like to schedule three U-SQL jobs in following timing: 02:00UTC, 03:00UTC and 04:00UTC everyday. I know that by default, jobs in the pipeline are executed at 12:00AM UTC hence all my jobs run at the same time which is not what I want.
I red the documentation and it is written that I should consider offset parameter in dataset template. However when I try to set this the following error occurs: .
I do not knot how to set different than 12:00AM runtime of U-SQL job. Can You provide me some info on how to do that properly? In addition I attach my template of a dataset and a pipeline:
Dataset
{
"name": "TransformedData2",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "ADLstore_linkedService_scrapper",
"typeProperties": {
"fileName": "TestOutput2.csv",
"folderPath": "transformedData/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
}
}
}
Pipeline
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
}
Thanks
Using the Offset attribute can get a little messy as you'll need to re-provision time slices at the dataset level.
As an alternative I would suggest using the Delay attribute at for the activity. This gives more control and does not require time slices to be re-provisioned.
So in your JSON...
{
"name": "filtering",
"properties": {
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "usqljobs\\cleanStatements.txt",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {}
},
"outputs": [
{
"name": "TransformedData2"
}
],
"policy": {
"delay": "02:00:00" // <<<<< 2:00am start
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "Brajan filtering",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-07-02T09:50:00Z",
"end": "2018-06-30T03:00:00Z",
"isPaused": false,
"hubName": "datafactoryfin_hub",
"pipelineMode": "Scheduled"
}
Then you'll of course need additional activities for the 3:00am and 4:00am versions.
Check out this link for more info:
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution
Delay is mentioned about a quarter of the way down the page.
Hope this helps