Search code examples
amazon-web-servicesamazon-cloudwatchaws-cloudwatch-log-insights

Unnesting a comma seperated string in AWS cloudwatch insights query


I have a log message like this

{"level":"INFO","location":"log_missing_items:xxx","message":"Missing_Items: Item1, Item2, Item3 ...","timestamp":"2025-01-02 21:46:15,382+0000","service":"xxxx","xray_trace_id":"xxxxxxxxx"}

I am trying to write a cloudwatch insights query to get only the missing items as rows so that it can be displayed in a cloudwatch insights query widget in a dashboard.

Expected result

Item1
Item2
Item3

is this possible with AWS cloudwatch Insights query ?


Solution

  • You can now use OpenSearch SQL language introduced in december 2024 during the AWS re:Invent 2024 to query your logs with Cloudwatch logs insight. You will find more details in this blog written by Elizabeth Fuentes and her team.

    Here's an SQL example to achieve your needs:

    SELECT distinct get(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]'), b.index) as missingItem
    FROM `LogGroupA` a
    INNER JOIN (
        SELECT ROW_NUMBER() OVER w - 1 index, MAX(regexp_count(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[^,]+')) OVER w max_missing_items
        FROM `LogGroupA`
        WINDOW w AS (ORDER BY id)
    ) b
    WHERE get(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]'), b.index) <> ''
    AND b.index < b.max_missing_items
    
    • regexp_extract(`@message`, 'Missing_Items: (.*)') will extract the list of missing items for one log line. For example: Item1, Item2, Item3.
    • split(..., '[,]') will build an array of the missing items from above: [Item1, Item2, Item3].
    • get(..., b.index) will get the b.index element of the array. b.index start from 0 to the maximum size of missing items array.
    • The inner SELECT generates the sequence that will be used to get elements in the array of missing items. Note that you will need to have more logs line than the maximum number of missing items array. That should not be a problem unless your maximum number of missing items is big, in that case you should validate that value.
    • regexp_count(..., '[^,]+') will count the number of elements in the missing items list. Here Item1, Item2, Item3 will give 3. I didn't find any useable size functions to get the size of an array.

    That query could have been much easier if Generator functions were supported by CloudWatch Logs. The explode function for example would do it directly (Something like SELECT distinct explode(split(regexp_extract(`@message`, 'Missing_Items: (.*)'), '[,]')) as missingItem FROM `LogGroupA` ).

    More details on supported OpenSearch SQL commands and functions here.