Search code examples
regexconcatenationnotepad++text-processing

How to concatenate lines only if they begin with same prefix


I have a large text made up of lines which are prefixed by a "key" word and a delimiter. I want to concatenate lines which have the same key word. The resulting line should have:

  • the key word
  • the delimiter
  • the concatenation of all the segments from each line following the keyword and delimiter

For example, this input:

word1|word3 word10 word4 word9 word8 word2 word3 word7 word5 word6

word1|word5 word8 word9 word6 word2 word7 word6 word10 word4 word5

word1|word8 word2 word3 word10 word4 word9 word7 word3 word5 word6

word1|word7 word6 word5 word8 word9 word6 word2 word10 word4 word5

should result in:

word1|word3 word10 word4 word9 word8 word2 word3 word7 word5 word6 word5 word8 word9 word6 word2 word7 word6 word10 word4 word5 word8 word2 word3 word10 word4 word9 word7 word3 word5 word6 word7 word6 word5 word8 word9 word6 word2 word10 word4 word5

Is it possible to do this in Notepad++ with Regex? How?


Solution

  • You can try this solution:

    • Find what
    ^(([^\|]+\|)(?:.*))((?:(?!^\2)[\s\S])*)(?:^\2(.*))\s*
    
    • Replace with
    \1 \4\3
    

    This cannot be done with a single replacement, since the match number is not fixed. You need to hit the Replace All button again and again until there are no more changes.

    For this example, you need to replace it twice to get the desired output:

    It also works if something else is in between