Search code examples
phpregexpcre

PCRE character class subtraction


I have a flow of data with some entries.

  • Entries contain 1 mandatory field and 1 optional field.
  • Fields are separated from each other by semicolon ;.
  • Fields contain any printable symbols EXCEPT SEMICOLON ;
  • Mandatory field should be 1-60 symbols in length.
  • Optional field could be 0-60 symbols in length.

I would like to match all fields within entries. I use negative lookahead assertion to subtract semicolon from [:print:] POSIX character class but it seems don't work with length-limited fields.

My data:

[1427894078] SERV;ICE ALERT: example.com ;Current Load;CRITICAL;SOFT;3;CRITICAL - load average: 1.96, 1.29, 0.59

My regex (PCRE):

((?!;)[[:print:]]{1,60});((?!;)[[:print:]]{0,60})

What I expect to get:

Match 1:
Group 1: [1427894078] SERV
Group 2: ICE ALERT: example.com 

Match 2:
Group 1: Current Load
Group 2: CRITICAL

Match 3:
Group 1: SOFT
Group 2: 3

What I wrongly get:

Match 1:
Group 1: [1427894078] SERV;ICE ALERT: example.com ;Current Load
Group 2: CRITICAL;SOFT;3;CRITICAL - load average: 1.96, 1.29, 0.59

Demo: https://regex101.com/r/3uObB5/2


Solution

  • You are totally close. The only problem with your regex is that you didn't include lookahead into the quantified cluster:

    • (?!;)[[:print:]]{1,60} should be (?:(?!;)[[:print:]]){1,60}

    Now it matches right chunks of characters (see live demo here):

    ((?:(?!;)[[:print:]]){1,60});((?:(?!;)[[:print:]]){0,60})
    

    However, there is a better alternative (see live demo here):

    ([^\p{C};]{1,60});([^\p{C};]{0,60})