Search code examples
regexnotepad++emeditor

Match strings between delimiting characters


There are strings appearing in a line with other text which are delimited by opening and closing quote, like the ones below. I am trying to find a regex that would match the word/phrase with the comma as internal delimiter (or the whole externally delimited content if there is no comma as in the case of a single word/phrase). For example for these phrases:

‘verdichten’
‘verdichten, verstopfen’
‘dunkel, finster, wolkig’
‘fort sein, verloren sein, verloren’
‘von den Nymph ergriffen, verzückt, verrückt’
‘der sich halten kann, halten kann’

The result I would like would be:

[[verdichten]]
[[verdichten]], [[verstopfen]]
[[dunkel]], [[finster]], [[wolkig]]
[[fort sein]], [[verloren sein]], [[verloren]]
[[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
[[der sich halten kann]], [[halten kann]]

It should work in Notepad++ or EmEditor.

I can match with (‘)(.+?)(’) but I cannot find a way to replace as described.


Solution

  • One option could be making use of the \G anchor and 2 capturing groups:

    (?:‘|\G(?!^))([^,\r\n’]+)(?=[^\r\n’]*’)(?:(,\h*)|’)
    

    In parts

    • (?: Non capturing group
      • Match
      • | Or
      • \G(?!^) Assert position at the end of previous match, not at the start
    • )* Close non capturing group
    • ( Capture group 1
      • [^,\r\n’]+ Match 1+ times any char except , or newline
    • ) Close group
    • (?=[^\r\n’]*’) Positive lookahead, assert what is on the right is
    • (?: Non capturing group
      • (,\h*)|’ Either capture a comma and 0+ horizontal whitespace chars in group 2, or match
    • ) Close non capturing group

    Regex demo

    In the replacement use:

    [[$1]]$2
    

    Output

    [[verdichten]]
    [[verdichten]], [[verstopfen]]
    [[dunkel]], [[finster]], [[wolkig]]
    [[fort sein]], [[verloren sein]], [[verloren]]
    [[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
    [[der sich halten kann]], [[halten kann]]