Search code examples
regexregex-group

Regex matching multiple groups


I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:

application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total

Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".

I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.

Regex which I used:

/(?:outbound)|(?:service_plus)|(?:failure)/

Solution

  • You should use multiple lookahead assertions:

    ^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?
    

    The above should use the MULTILINE flag so that ^ is interpreted as start of string or start of line.

    1. ^ - matches start of string or start of line.
    2. (?=.*outbound) - asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)
    3. (?=.*service_plus) - asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)
    4. (?=.*failure) - asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced)
    5. .*\n? - matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)

    See RegEx Demo

    In Python, for example:

    import re
    
    lines = """application_outbound_api_external_metrics_service_plus_success_total
    application_outbound_api_external_metrics_service_plus_failure_total
    application_inbound_api_metrics_service_success_total
    application_inbound_api_metrics_service_failure_total
    failureoutboundservice_plus"""
    
    rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)
    
    filtered_lines = ''.join(rex.findall(lines))
    print(filtered_lines)
    

    Prints:

    application_outbound_api_external_metrics_service_plus_failure_total
    failureoutboundservice_plus