Search code examples
regexposixpcrepostfix-mta

How to convert a regex from PCRE to POSIX format, that warns about repetition-operator operand invalid?


Trying to anonymize received headers for relayed messages from authenticated postfix users, there is an example from https://we.riseup.net/debian/anonymizing-postfix:

/^Received: from (.* \([-._[:alnum:]]+ \[[.[:digit:]]{7,15}\]\)).*?([[:space:]]+).*\(Authenticated sender: ([^)]+)\).*by (auk\.riseup\.net) \(([^)]+)\) with (E?SMTPS?A?) id ([A-F[:digit:]]+).*/ REPLACE Received: from [127.0.0.1] (localhost [127.0.0.1])$2(Authenticated sender: $3)${2}with $6 id $7

When editing the file regexp:/etc/postfix/header_checks the result is an error message:

line 15: repetition-operator operand invalid

Now my guess is that the above regex is in PCRE format, where my Postfix requests a POSIX compatible regular expression.

How to make the above regular expression POSIX regexp compliant for use in a Postfix header_checks file?


Solution

  • Your hunch is correct, .*? is a PCRE construct: .* is normal "any character, as many times as possible, at least zero times", and the trailing question mark changes that to "... as few times as possible ...". SUSv4 says:

    The behavior of multiple adjacent duplication symbols ( '+' , '*' , '?' , and intervals) produces undefined results.

    i haven't studied the pattern too much, but you should be able to work around this particular incompatibility: the next subpattern is ([[:space:]]+), so you should be able to reformulate it as "any non-space character...":

    [^[:space:]]*([[:space:]]+)
    

    or maybe just get rid of the problem by omitting the question mark. the space-eater is followed by another .* after all.