Search code examples
regexamazon-cloudwatchaws-cloudwatch-log-insights

CloudWatch Insights - Group logs by url with unique ids removed


I'm looking to use CloudWatch Logs Insights to group logs by a request url field, however the url can contain 0-2 unique numerical identifiers that I'd like to be ignored when doing the grouping.

Some examples of urls:

/dev/user
/dev/user/123
/dev/user/123/inventory/4
/dev/server/3/statistics

The groups would look something like:

/dev/user
/dev/user/
/dev/user//inventory/
/dev/server//statistics

I have something quite close to what I need which extracts the section of the url in front of the first optional identifier and the section between the first identifier and the second identifier and concatenates the two, but it isn't totally reliable. This is where I'm at currently, @message is valid json which containers an 'endpoint' field that looks like one of the urls above:

fields @message | parse endpoint /(\bdev)\/(?<@prefix>[^0-9]+)(?:[0-9]+)(?<@suffix>[^0-9]+)/ | stats count(*) by @prefix

While this query will work with endpoints like '/dev/accounts/1' it ignores endpoints like '/dev/accounts' as it doesn't have all of the components the regex is looking for, which means I'm missing a lot of results.


Solution

  • Looks like I can use question marks outside of capture groups to mark those groups as optional, which has resolved the last issue I was having.

    Regex demo