There are strings appearing in a line with other text which are delimited by opening and closing quote, like the ones below. I am trying to find a regex that would match the word/phrase with the comma as internal delimiter (or the whole externally delimited content if there is no comma as in the case of a single word/phrase). For example for these phrases:
‘verdichten’
‘verdichten, verstopfen’
‘dunkel, finster, wolkig’
‘fort sein, verloren sein, verloren’
‘von den Nymph ergriffen, verzückt, verrückt’
‘der sich halten kann, halten kann’
The result I would like would be:
[[verdichten]]
[[verdichten]], [[verstopfen]]
[[dunkel]], [[finster]], [[wolkig]]
[[fort sein]], [[verloren sein]], [[verloren]]
[[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
[[der sich halten kann]], [[halten kann]]
It should work in Notepad++ or EmEditor.
I can match with (‘)(.+?)(’)
but I cannot find a way to replace as described.
One option could be making use of the \G
anchor and 2 capturing groups:
(?:‘|\G(?!^))([^,\r\n’]+)(?=[^\r\n’]*’)(?:(,\h*)|’)
In parts
(?:
Non capturing group
‘
Match ‘
|
Or\G(?!^)
Assert position at the end of previous match, not at the start)*
Close non capturing group(
Capture group 1
[^,\r\n’]+
Match 1+ times any char except ,
or newline)
Close group(?=[^\r\n’]*’)
Positive lookahead, assert what is on the right is ’
(?:
Non capturing group
(,\h*)|’
Either capture a comma and 0+ horizontal whitespace chars in group 2, or match ’
)
Close non capturing groupIn the replacement use:
[[$1]]$2
Output
[[verdichten]]
[[verdichten]], [[verstopfen]]
[[dunkel]], [[finster]], [[wolkig]]
[[fort sein]], [[verloren sein]], [[verloren]]
[[von den Nymph ergriffen]], [[verzückt]], [[verrückt]]
[[der sich halten kann]], [[halten kann]]