I have a text file containing some thousand lines as follows:
File:
abc: bla1 bla1 bla1...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
abc: bla2 bla2 bla2...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
I want to merge all the lines that start with the same items (as 1 and 5, 3 and 7
) so that I have a new text file like this:
New File:
abc: bla1 bla1 bla1... * abc: bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... * ghk: bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...
I wonder if this is possible to be solved using regex
and/or grep
, and if yes then how can I solve it?
I'm quite familiar with grep
because I'm on TextWrangler, but also OK with other text editors.
Help much appreciated.
If order doesn't matter, I suggest first sorting the text. That will place
abc: ...
abc: ...
next to one another. Then you'll run this regex through a few passes:
Search:
^(\w+): (.*)\n\1:
Replace:
\1: \2
Result:
abc: bla1 bla1 bla1... bla2 bla2 bla2...
bcd: bla bla bla...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
xyz: bla bla bla...
If order DOES matter, then this regex can be run through a few times:
Search:
^(\w+): (.*)\n((?:(?!\1).*\n)+)\1: (.*\n)
Replace:
\1: \2 \4\3
Result (1st pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
Result (2nd pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...