Search code examples
phpregexpreg-match-all

I want a regex pattern to grab text within a pattern that excludes matches followed by a particular string


I'm using preg_match_all() like this:

preg_match_all('/("user":"(.*?)".*?-->)/', $input_lines, $output_array);

On a string, the idea is that I want it to get whatever comes after "user" in a commented out block (it's a long story). So let's say $input_lines is something like:

<!-- block {"user":"josh-hart", 1234566 ,"hide":false} /-->
<!-- block {"user":"jalen-brunson", 7744633 ,"hide":true} /-->
<!-- block {"user":"julius-randle", 333333,"hide":false} /-->
<!-- block {"user":"obi-toppin", 4hh3n33n33n, /-->
<!-- block {"user":"rj-barrett", nmremxxx!! ,"hide":true} /-->
<!-- block {"user":"mitch-robinson",yahaoao /-->

I want it to match the user. But here's the the thing, I only want the user if "hide":true does not appear before the /-->. So for this string I would want the matches to be:

josh-hart, julius-randle, obi-toppin, mitch-robinson

What is this called in regex terms and how do i do it?


Solution

  • Assuming that opening and closing a comment is from <!-- to --> and there is no other use of these in between, you can first get the matches out of the way that contain <!-- ... "hide":true ... --> without crossing the opening or closing of a comment.

    Then you can get a single match of the username, still in between the comment markers and independent of the order of appearance.

    <!--(?:(?!-->|"hide":).)*+"hide":true\b(?:(?!-->).)*/-->(*SKIP)(*F)|<!--(?:(?!-->|"user":).)*"user":"\K[^"]+(?="(?:(?!-->).)*-->)
    

    The pattern matches:

    • <!-- Match literaly
    • (?:(?!-->|"hide":).)*+ Optionally repeat matching any character not directly followed by either --> or "hide": using a Tempered greedy token
    • "hide":true\b Match "hide":true followed by a word boundary to prevent a partial word match
    • (?:(?!-->).)*/--> Match until the closing -->
    • (*SKIP)(*F) Skip the current match
    • | Or
    • <!-- Match literally
    • (?:(?!-->|"user").)* Optionally repeat matching any character not directly followed by either --> or "user:
    • "user":"\K Match "user":" and forget what is matched so far
    • [^"]+ Match 1+ chars other than " (the username that you want to match)
    • (?="(?:(?!-->).)*-->) Assert --> to the right

    Note that you can make the matching of the username more specific, as for now it matches 1 or more characters other than a double quote with [^"]+ which can also be a space or a newline. If you want to match only non whitespace characters except for a double quote, than you can change it to [^\s"]+

    See a regex demo.