Search code examples
regexregex-lookaroundsregex-negation

Regex Negation: Handling conditional if statements that cancel the match if fulfilled


Say for example I have this text:

hello world **ant*** lorem **cat** opposum** *** ***antelope*** *rabbit __dog__

I would like to match strings that only have ** and __ as its preceding and concluding characters. So in the case above, the matches that I would only want are "cat" and "dog". This means that I have to cancel or negate the match if there are extra surrounding characters. For example, ***dog** or __dog___ should fail.

I've tried to solve this using a negative look around http://www.regular-expressions.info/lookaround.html to no avail.

Here's the current pattern I have

const pattern = /([^*])\*(\w+)\*([^*])/g;
const match = pattern.exec(text);
const annotatedText = match[0];
const matchedText = match[1];

// Return if annotatedText is a possible match for bolditalic
if (annotatedText.startsWith("***") || annotatedText.startsWith("___")) {
        return;
}
// Return if the matchedText has spaces in between
if (/\s/.test(matchedText)) {
        return;
}
if (text.match(/^([*_ \n]+)$/g)) {
        return;
}

in javascript regex,

Essentially, I want to remove the javascript string checks and add the logic on the regex pattern itself.


Solution

  • Use

    /(?<=(?<!\*)\*\*)\w+(?=\*\*(?!\*))|(?<=(?<!_)__)\w+(?=__(?!_))/gi
    

    See proof.

    Explanation

    --------------------------------------------------------------------------------
      (?<=                     look behind to see if there is:
    --------------------------------------------------------------------------------
        (?<!                     look behind to see if there is not:
    --------------------------------------------------------------------------------
          \*                       '*'
    --------------------------------------------------------------------------------
        )                        end of look-behind
    --------------------------------------------------------------------------------
        \*                       '*'
    --------------------------------------------------------------------------------
        \*                       '*'
    --------------------------------------------------------------------------------
      )                        end of look-behind
    --------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                               more times (matching the most amount
                               possible))
    --------------------------------------------------------------------------------
      (?=                      look ahead to see if there is:
    --------------------------------------------------------------------------------
        \*                       '*'
    --------------------------------------------------------------------------------
        \*                       '*'
    --------------------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    --------------------------------------------------------------------------------
          \*                       '*'
    --------------------------------------------------------------------------------
        )                        end of look-ahead
    --------------------------------------------------------------------------------
      )                        end of look-ahead
    --------------------------------------------------------------------------------
     |                        OR
    --------------------------------------------------------------------------------
      (?<=                     look behind to see if there is:
    --------------------------------------------------------------------------------
        (?<!                     look behind to see if there is not:
    --------------------------------------------------------------------------------
          _                        '_'
    --------------------------------------------------------------------------------
        )                        end of look-behind
    --------------------------------------------------------------------------------
        __                       '__'
    --------------------------------------------------------------------------------
      )                        end of look-behind
    --------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                               more times (matching the most amount
                               possible))
    --------------------------------------------------------------------------------
      (?=                      look ahead to see if there is:
    --------------------------------------------------------------------------------
        __                       '__'
    --------------------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    --------------------------------------------------------------------------------
          _                        '_'
    --------------------------------------------------------------------------------
        )                        end of look-ahead
    --------------------------------------------------------------------------------
      )                        end of look-ahead
    

    JavaScript code:

    const string = 'hello world **ant*** lorem **cat** opposum** *** ***antelope*** *rabbit __dog__';
    console.log(string.match(/(?<=(?<!\*)\*\*)\w+(?=\*\*(?!\*))|(?<=(?<!_)__)\w+(?=__(?!_))/gi))