Search code examples
regexnotepad++

Find words in double brackets and add a character


I want to match any words contained between double brackets and add an extra character (using Notepad++ or EmEditor). For example the following text:

[[test]], look at my [[test|testing]] how it tests; I have a [[test]],you know a [[test, and more test]] and [[there's another test you know]]

Should become

[[test한]], look at my [[test한|testing한]] how it tests; I have a [[test한]],you know a [[test한, and한 more한 test한]] and [[there한's한 another한 test한 you한 know한]]

So far I can only match the full content: \[\[.*?\]\]


Solution

  • A strict regex version will be

    (?:\G(?!\A)|\[\[)(?:(?!\[\[|]]).)*?\K\w+(?=.*?]])
    

    This regex finds any one or more word characters only inside [[ and ]]. See the regex demo.

    A less stricter pattern will be

    (?:\G(?!\A)|\[\[)(?:(?!\[\[|]]).)*?\K\w+
    

    Note the missing lookahead at the end. This regex finds any one or more word characters only inside [[ and ]] or just between [[ and end of the string. See this regex demo.

    If your text contains only well balanced brackets, you may go for a regex that will match any word before any zero or more chars other than brackets followed with ]]:

    \w+(?=[^][]*]])
    

    See this regex demo.

    The replacement will be $0한 in all three cases where $0 represents the whole match value.

    Pattern details

    • (?:\G(?!\A)|\[\[) - either the end of the preceding match ((?!\A) excludes start of string position from \G) or [[
    • (?:(?!\[\[|]]).)*? - any one char (other than line break chars if . matches newline is OFF, else including newlines), zero or more but as few as possible occurrences, that is not the starting point for [[ or ]] char sequences (thus, matching is done only between [[ and ]] but this alone does not require ]] to be right there)
    • \K - a match reset operator that discards the text matched so far from the overall match value
    • \w+ - one or more word chars
    • (?=.*?]]) - a positive lookahead that requires zero or more chars other than line break chars (if . matches newline option is OFF, else including newlines) as few as possible and then ]].