Search code examples
sedreplacecapturing-group

replace pattern in lines with data find in another file with sed in BASH


I have 2 files: a channellist file and a global source file containing data of all channels.

In the channellist there are f.i. these 2 lines; of course there are a lot more and all lines are starting with #SERVICE 1:0:1:0:0:0:0:0:0:0::

#SERVICE 1:0:1:0:0:0:0:0:0:0::Channel 4#DESCRIPTION channel 4"
#SERVICE 1:0:1:0:0:0:0:0:0:0::Sky History#DESCRIPTION sky history"

In the global source file is f.i. somewhere:

28.2e channel4.uk 1:0:1:2085:808:2:11A0000:0:0:0: channel 4 
28.2e skyhistory.uk 1:0:1:C4C0:811:2:11A0000:0:0:0: sky history 

What I want is in the channellist file the ‘old’ service reference 1:0:1:0:0:0:0:0:0:0: be replaced (IN PLACE: sed -i) by the corresponing one from the global source file (found by making use of the channel name). As an example here, as you can see, the 2 'new' service references from the global source file are:

1:0:1:2085:808:2:11A0000:0:0:0:  (=Channel 4 )
1:0:1:C4C0:811:2:11A0000:0:0:0: (=Sky History)

So actually I want to replace in the channellist file '1:0:1:0:0:0:0:0:0:0:' by '1:0:1:2085:808:2:11A0000:0:0:0:' for channel 4 and by ''1:0:1:C4C0:811:2:11A0000:0:0:0:' for Sky History.

What should be done for the line with channel 4 is f.i.:

  1. grab the channel name in the channellist file (which is f.i. behind ‘#DESCRIPTION’ at a line;
  2. find the line in the global source file with that channel name (all channel names are everywhere already converted to lowercase);
  3. get from that line (in the global source file) the ‘new’ service reference which is in a line after the 2nd space and before the 3rd space;
  4. replace in the channellist the old service reference (1:0:1:0:0:0:0:0:0:0:) with the new one (1:0:1:2085:808:2:11A0000:0:0:0:).

Maybe it is the wrong direction, but since I do not know if 'sed group capturing' is needed for the solution, I know 'manually' how to change the service reference by making use of sed group capturing, but without looking up. So how to find (lookup) the data in the global source file by making use of the channel name which is actually in captured group 4 (see below).

SED group capturing channellist file:

  1. capture group 1: from the beginning to #SERVICE [space] => (.*#SERVICE )
  2. capture group 2: from 1, capture everything till :: (which is in fact the service reference which must be replaced) => (.*::)
  3. capture group 3: from 2, caputure everything till #DESCRIPTION [space] => (.#DESCRIPTION )
  4. capture group 4: from 3, capture everything till the end (which is in fact the channel name) => (.*)

As a test just here, I did make use (just for now) of the variables oldLine (which is a line in the channellist file) and newSREF:

oldLine="#SERVICE 1:0:1:0:0:0:0:0:0:0::Channel 4#DESCRIPTION channel 4".
newSREF="1:0:1:2085:808:2:11A0000:0:0:0:

The command is:

sed "s/\(.*#SERVICE \)\(.*::\)\(.*#DESCRIPTION \)\(.*\)/\1${newSREF}\3\4/g" <<< $oldLine

Solution

  • Thanks to Walter A. for his excellent suggestion(s). I visited the link he posted in his 1st reaction here and there (in the 1st reaction/answer) I found idiomatic awk: i.e. under ‘Two-file processing’ to change the opcodes in the file data.txt by making use of the file map.txt with the command (there): awk 'NR == FNR{a[$1]=$2;next} {$3=a[$3]}1' map.txt data.txt

    1. First I used awk to convert in both files (globlal soure file and channellist file) the channelnames to lowercase.

    2. Then I reduced the global source file into 2 columns: service reference (column 1) and channelname (column 2); more data is not needed.

    3. In the channellist file I added a space at the end of the service reference and before the channel name (so after the 2 colons) to create an extra column for the service reference; which will be used later for easy replacement.

    4. Because a channelname can consist of multiple words and thus multiple columns: I changed in both files (by using awk) the space (if available0 in the columns 2,3,4,5 etc to 2 underscores (‘__’); so the channelnames are now in 1 column. This will be changed back later of course.

    5. Because I am using an old version of awk (option '-i inplace' is not available) I copied the ‘channellist file’ to ‘channellist file_tmp’ (as temporary file)

    Now I was ready to take the channelnames (2nd column in the global source file) and use the corresponding service rerence (1st column), then lookup the same channelname in the channellist file and replace the ‘old’ service reference with the one from the global source file with the command:

    awk 'NR == FNR{a[$2]=$1;next} {$2=a[$3]}1' [global source file] [channellist file]_tmp] > [channellist file]