Search code examples
regexregex-group

Regex rule optional matching characters set in a specific point


I'm working with regex on PRCE2 environment.

In my switch logs I have to capture a text string that I'm capturing as "message" and that is located in a specific position. The focus point is that it is always preceded by a set of characters ending with : but, after them, I can have or not some addictional characters ending with ; and I must be able to skip them.

Let me explain with my current regex and some log samples.

We can say that I have 3 chances:

 1. (s)[18014]:Recorded command information.
 2. (l):User logged out.
 3. (s)[18014]:CID=0x11aa2222;The user succeeded in logging out of XXX.

My current regex is:

\(\w+\)\[*\d*\]*\:(?<message>[^\[]+?\.)

that works for case 1 and 2 because:

  • capture the fact that we always have a (, a literal character and a ) with \(\w+\)
  • capture, as in case 2, if after that we have a [, a number and a ] with \[*\d*\]*
  • in every case the following characters are : and I capture it with \:
  • The message is captured, and named, with (?<message>[^\[]+?\.) that must avoid the capturing action if, after :, I have a [. The capture stops when when I get a .

My problem is: after the : I can have the case 3; it always begin with CID=<exadecimal expression>; but it is not only limited to this. After it, I can have other expression always ended by ; So we can say that I can have, for case 3, CID=<hex expression><other numeric and literal characters>;. With current regex, of course, the CIDR part is included in the message. I must avoid it; if the CIDR part is present, the message capture must start after the ; that end it.

So, we can summarize that: IF after the : we have no CIDR word, starts capturing; ELSE, avoid capturing until ; and start the job after it.


Solution

  • The following pattern will match the right part of your test strings.
    We look for either a : not followed by CID ?!CID or a ;. We then capture what follows.

    ((:(?!CID))|;)(.*)
    

    see https://regex101.com/r/JRB4Rq/1