Search code examples
regexperlbackreferencecapturing-group

how to fetch inner values from Regex nested backreference


I receive input from the server in the following manner (sample input data):

[1284336000]: host1;event1;flag;state;counter;errors or warnings
[1284336000]: host2;event1;flag;state;counter;errors or warnings
[1284336000]: host1;event2;flag;state;counter;errors or warnings
[1284336000]: host2;event2;flag;state;counter;errors or warnings

I have to match the input and based on the match, create a variable with a value hostname-eventname.

My Regex

^\[\d+\]:\s((host1);(event1)|(host2);(event2)|(host3);(event2)|(host2);(event1));(\w+);(\w+);(\d).+$

I want the name of the host and event separately in reference variables like $2 or $3.

For example, consider this input:

[1284336000]: host1;event1;flag;state;counter;errors or warnings

I need to create a variable with a name <hostname-eventname> according to the hostname and eventname fetched from the match above.

Say,

$myVar=$2-$3  (that is, $myVar=host1-event1)

I cannot apply any split operation any further. So no programming: I can only read input data. And yes, the regex is of Perl regex type.

I don't know if I clarified my query or not?


Solution

  • You need to use the branch-reset operator, (?|…|…|…):

    ^\[\d+\]:\s(?|(host1);(event1)|(host2);(event2)|(host3);(event2)|(host2);(event1));(\w+);(\w+);(\d).+$
    

    Or more legibly:

    m{
        ^ \[ \d+ \] : \s
        (?| (host1);(event1)  # $1, $2
          | (host2);(event2)  # $1, $2
          | (host3);(event2)  # $1, $2
          | (host2);(event1)  # $1, $2
        )
        ;
        (\w+);(\w+)           # $3, $4
        ; (\d)                # $5
        .+ $
    }x