Search code examples
azureazureportalazure-machine-learning-service

Send alert if Azure ML pipeline fails


I am trying to add an alert if Azure ML pipeline fails. It looks that one of the ways is to create a monitor in the Azure Portal. The problem is that I cannot find a correct signal name (required when setting up condition), which would identify pipeline fail. What signal name should I use? Or is there another way to send an email if Azure pipeline fails?


Solution

  • What signal name should I use?

    You can use PipelineChangeEvent category of AmlPipelineEvent table to view events when ML pipeline draft or endpoint or module are accessed (read, created, or deleted).

    For example, according to documentation, use AmlComputeJobEvent to get failed jobs in the last five days:

    AmlComputeJobEvent
    | where TimeGenerated > ago(5d) and EventType == "JobFailed"
    | project  TimeGenerated , ClusterId , EventType , ExecutionState , ToolType
    

    Updated answer:

    According to Laurynas G:

    AmlRunStatusChangedEvent 
    | where Status == "Failed" or Status == "Canceled"
    

    You can refer to Monitor Azure Machine Learning, Log & view metrics and log files and Troubleshooting machine learning pipelines