Search code examples
amazon-web-servicesaws-glueaws-glue-data-catalog

How can I monitor glue crawler execution stats?


I am using AWS Glue to do data ETL. I couldn't find a way to monitor the glue crawler execution stats on AWS. I know how to monitor glue job like this doc: https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html. But I wonder whether there is a similar way to check glue crawler execution?

I can check the log of crawler in Cloudwatch but it is not very readable. It is hard for me to figure out how many invocation happens during a specified time period.


Solution

  • I asked the AWS support center the same question. Here is the answer:

    From my understanding, the Glue crawler does not publish CloudWatch metrics for the execution and the statistics that you are looking to monitor however Glue crawler is able to publish logs to a CloudWatch Log Group and Log Stream(s). Based on these log event messages, you can create a metric filter [1] to match a particular filter pattern and generate your own metric to monitor and alarm. For example, if the filter pattern "Crawler has finished running and is in state READY" is detected by the metric filter it will publish a value to your custom namespace metric. Here are some steps to create a metric filter:

    1) Open the CloudWatch Log Groups console
    2) Select the Glue crawler log group
    3) Select Metric filters, choose Create metric filter
    4) In Filter pattern, enter a pattern that you want to match in the log streams, ie: "Crawler has finished running and is in state READY", then choose Next
        4a) You can test your filter pattern against a log stream or by manually specifying log event messages
    5) Enter a filter name, enter a customer metric namespace, metric name, metric value. The metric value while be published on the metric, ie: 1, then choose Next
    6) Review the metric filter configuration and choose Create metric filter
    

    CloudWatch Events are capable of invoking a target based on Glue crawler state changes, for example if the crawler state changes to failed, this could invoke an SNS Topic target and send you an email. Below are some steps to create a CloudWatch Event:

    1) Open the CloudWatch Rules console
    2) Choose Create rule
    3) In Service Name, select Glue, in Event Type select Glue Crawler State Change
    4) Choose Specific state(s) and choose Failed
    5) Add a Target, for example SNS Topic, choose Configure details
    6) Enter a Rule name and choose Create rule
    

    Enjoy the rest of your day.