Search code examples
regexreplacegreptextwrangler

How to merge lines that start with the same items in a text file


I have a text file containing some thousand lines as follows:

File:

abc: bla1 bla1 bla1... 
cde: bla bla bla... 
ghk: bla1 bla1 bla1... 
lmn: bla bla bla...
abc: bla2 bla2 bla2... 
bcd: bla bla bla... 
ghk: bla2 bla2 bla2... 
xyz: bla bla bla...

I want to merge all the lines that start with the same items (as 1 and 5, 3 and 7) so that I have a new text file like this:

New File:

abc: bla1 bla1 bla1... * abc: bla2 bla2 bla2... 
cde: bla bla bla... 
ghk: bla1 bla1 bla1... * ghk: bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...   
xyz: bla bla bla...

I wonder if this is possible to be solved using regex and/or grep, and if yes then how can I solve it?

I'm quite familiar with grep because I'm on TextWrangler, but also OK with other text editors.

Help much appreciated.


Solution

  • If order doesn't matter, I suggest first sorting the text. That will place

    abc: ...
    abc: ...
    

    next to one another. Then you'll run this regex through a few passes:

    Search:
      ^(\w+): (.*)\n\1: 
    Replace:
      \1: \2 
    
    Result:
       abc: bla1 bla1 bla1... bla2 bla2 bla2...
       bcd: bla bla bla...
       cde: bla bla bla...
       ghk: bla1 bla1 bla1... bla2 bla2 bla2...
       lmn: bla bla bla...
       xyz: bla bla bla...
    

    If order DOES matter, then this regex can be run through a few times:

    Search:
      ^(\w+): (.*)\n((?:(?!\1).*\n)+)\1: (.*\n)
    Replace:
      \1: \2 \4\3
    
    Result (1st pass):
      abc: bla1 bla1 bla1... bla2 bla2 bla2...
      cde: bla bla bla...
      ghk: bla1 bla1 bla1...
      lmn: bla bla bla...
      bcd: bla bla bla...
      ghk: bla2 bla2 bla2...
      xyz: bla bla bla...
    
    Result (2nd pass):
      abc: bla1 bla1 bla1... bla2 bla2 bla2...
      cde: bla bla bla...
      ghk: bla1 bla1 bla1... bla2 bla2 bla2...
      lmn: bla bla bla...
      bcd: bla bla bla...
      xyz: bla bla bla...