Search code examples
phpregexpreg-match

Match specific not between tags


I have some regex expressions to put content between tag, as seen on result. If i apply the same regex expressions on the resulted text i will get tags inside tags...

ORIGINAL CONTENT:

Lorem ipsum 123456 dolor sit @twitter amet, consectetur adipiscing elit example .

RESULT:

Lorem ipsum [tel]123456[/tel] dolor sit [tw]@twitter[/tw] amet, consectetur adipiscing elit [a]example[/a] .

RESULT SECOND TIME:

Lorem ipsum [tel][tel]123456[/tel][/tel] dolor sit [tw][tw]@twitter[/tw][/tw] amet, consectetur adipiscing elit [a][a]example[/a][/a] .

What to put in my regex expressions so that will not match if content is between any [] and [/] ?


Solution

  • Description

    (?:[0-9]+|twitter|consectetur)(?![0-9a-z]*\[\/[a-z]+\])
    

    Replace with: [xx]$0[/XX]

    Regular expression visualization

    This regular expression will do the following:

    • find all the strings of numbers, the word twitter, and the word consectetur. I selected these substrings to illustrate the regular expression but these could be replaced with other strings.
    • verify that the word is not already followed by a close tag
    • avoid edge cases
      • the construct [0-9+] will match 2345 which is in the source string but it may already be wrapped by tags
      • matching twitter without the leading @ still has a trailing tag

    Example

    Live Demo

    https://regex101.com/r/lW2pY6/1

    Sample Text

    123456 Lorem ipsum [tel]123456[/tel] dolor sit [tw]@twitter[/tw] amet, consectetur adipiscing elit [a]example[/a]

    Sample After Replacement

    [XX]123456[/XX] Lorem ipsum [tel]123456[/tel] dolor sit [tw]@twitter[/tw] amet, [XX]consectetur[/XX] adipiscing elit [a]example[/a]

    Explanation

    NODE                     EXPLANATION
    ----------------------------------------------------------------------
      (?:                      group, but do not capture:
    ----------------------------------------------------------------------
        [0-9]+                   any character of: '0' to '9' (1 or more
                                 times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
       |                        OR
    ----------------------------------------------------------------------
        twitter                  'twitter'
    ----------------------------------------------------------------------
       |                        OR
    ----------------------------------------------------------------------
        consectetur              'consectetur'
    ----------------------------------------------------------------------
      )                        end of grouping
    ----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
        [0-9a-z]*                any character of: '0' to '9', 'a' to 'z'
                                 (0 or more times (matching the most
                                 amount possible))
    ----------------------------------------------------------------------
        \[                       '['
    ----------------------------------------------------------------------
        \/                       '/'
    ----------------------------------------------------------------------
        [a-z]+                   any character of: 'a' to 'z' (1 or more
                                 times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
        \]                       ']'
    ----------------------------------------------------------------------
      )                        end of look-ahead