Search code examples
bashshellshksh

putting result of awk (multi-line variable) to another awk output


I just posted a question about using grep on multi-line shell variable, but I just realized that what I needed was slightly different. grep multiline shell variable from output of executable file

What I tried to do was this: I have a grep/awk result (I'll name this as result1):

blahblah ID1 blahblah aaa
blahblah ID2 blahblah bbb
blahblah ID3 blahblah ccc
...
blahblah ID(m) blahblah mmm
blahblah ID(n) blahblah nnn

And I have another awk result from a execution output (run | awk ~~~) (I'll name this as result2):

ID1 (some sentence 1)
ID2 (some sentence 2)
ID3 (some sentence 3)
...
IDn (some sentence n)

I'm trying to get the ID1~n and the last part of result1 (aaa~nnn) from result1 and add it to result2. what I want to make looks like this:

ID1 (sentence) aaa
ID2 (sentence) bbb
...
IDn (sentence) nnn

I somehow succeeded getting

ID1 aaa
ID2 bbb

from result1, so I only have the IDn's that I have in result2, but I have no idea how to separate it and put it exactly with matching lines of result2, so I can match ID1-aaa, ID2-bbb...and so on, so I can get

ID1 (sentence) aaa
ID2 (sentence) bbb
...
IDn (sentence) nnn

something like this.

plus, those ID1 ~ IDn may not be always in order.


Solution

  • Assumptions:

    • result1 has space-separated columns and the strings aaa ... nnn are in the last columns.
    • IDn in result1 consists of literal string ID followed by digits.
    • IDn in result2 are located in the first column.

    Then would you please try:

    awk '
        NR==FNR {
            if (match($0, /ID[0-9]+/)) {
                id = substr($0, RSTART, RLENGTH)
                a[id] = $NF
            }
            next
        }
        {
            print $0, a[$1]
        }
    ' result1 result2
    
    • The NR==FNR { .. ; next} block is an idiom to be exectuted for the file only in the first argument (result1 in this case).
    • The function match($0, /ID[0-9]+/) returns true if a substring in the record matches a string ID followed by digits, assigining awk variables RSTART and RLENGTH to the starting position and the length of the match, individually.
    • substr($0, RSTART, RLENGTH) extracts the substring IDn where n is the digits.
    • a[id] = $NF associates the last part (e.g. aaa) to the id.
    • The {print $0, a[$1]} block is executed for result2 only.

    If result1 is the output of command1 .. and result2 is of command2 .., you can say:

    awk '
      (the same lines as above)
    ' <(command1 ..) <(command2 ..)
    

    instead of specifying the filenames.