Search code examples
stringawkreplacegsubtext-processing

Replace each nth occurrence of 'foo' and 'bar' on two distincts columns by numerically respective nth line of a supplied file in respective columns


I have a source.txt file like below containing two columns of data. The format of the columns of source.txt include [ ] (square bracket) as shown in my source.txt:

[hot] [water]
[16] [boots and, juice]

and I have another target.txt file and contain empty lines plus full stops at the end of each line:

the weather is today (foo) but we still have (bar). 

= (

the next bus leaves at (foo) pm, we can't forget to take the (bar).

I want to do replace foo of each nth line of target.txt with the "respective contents" of the first column of source.txt, and also replace bar of each nth line of target.txt with the "respective contents" of the second column of source. txt.

i tried to search other sources and understand how i would do it, at first i already have a command that i use to replace "replace each nth occurrence of 'foo' by numerically respective nth line of a supplied file" but i couldn't adapt it:

awk 'NR==FNR {a[NR]=$0; next} /foo/{gsub("foo", a[++i])} 1' source.txt target.txt > output.txt;

I remember seeing a way to use gsub with containing two columns of data but I don't remember what exactly the difference was.

EDIT POST: sometimes read with some symbols between them = and ( and ) within the target.txt text. I added this symbol as some answers will not work if these symbols are in the target.txt file

Note: the number of target.txt lines and therefore the number of occurrences of bar and foo in this file can vary, I just showed a sample. But the number of occurrences of both foo and bar in each row is 1 respectively.


Solution

  • With your shown samples, please try following answer. Written and tested in GNU awk.

    awk -F'\\[|\\] \\[|\\]' '
    FNR==NR{
      foo[FNR]=$2
      bar[FNR]=$3
      next
    }
    NF{
      gsub(/\<foo\>/,foo[++count])
      gsub(/\<bar\>/,bar[count])
    }
    1
    ' source.txt FS=" " target.txt
    

    Explanation: Adding detailed explanation for above.

    awk -F'\\[|\\] \\[|\\]' '       ##Setting field separator as [ OR ] [ OR ] here.
    FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when source.txt will be read.
      foo[FNR]=$2                   ##Creating foo array with index of FNR and value of 2nd field here.   
      bar[FNR]=$3                   ##Creating bar array with index of FNR and value of 3rd field here.
      next                          ##next will skip all further statements from here.
    }
    NF{                             ##If line is NOT empty then do following.
      gsub(/\<foo\>/,foo[++count])  ##Globally substituting foo with array foo value, whose index is count.
      gsub(/\<bar\>/,bar[count])    ##Globally substituting bar with array of bar with index of count.
    }
    1                               ##printing line here.
    ' source.txt FS=" " target.txt  ##Mentioning Input_files names here.
    


    EDIT: Adding following solution also which will handle n number of occurrences of [...] in source and matching them at target file also. Since this is a working solution for OP(confirmed in comments) adding this in here. Also fair warning this will fail when source.txt contains a &.

    awk '
    FNR==NR{
      while(match($0,/\[[^]]*\]/)){
        arr[++count]=substr($0,RSTART+1,RLENGTH-2)
        $0=substr($0,RSTART+RLENGTH)
      }
      next
    }
    {
      line=$0
      while(match(line,/\(?[[:space:]]*(\<foo\>|\<bar\>)[[:space:]]*\)?/)){
        val=substr(line,RSTART,RLENGTH)
        sub(val,arr[++count1])
        line=substr(line,RSTART+RLENGTH)
      }
    }
    1
    ' source.txt target.txt