Search code examples
linuxawkgsub

Awk iteratively replacing strings from array


I've been recently trying to do the following in awk - we have two files (F1.txt F2.txt.gz). While streaming from the second one, I want to replace all occurrences of entries from f1.txt with its substrings. I came to this point:

zcat F2.txt.gz |
    awk 'NR==FNR {a[$1]; next}
    {for (i in a)
         $0=gsub(i, substr(i, 0, 2), $0) #this does not work of course
    }
    {print $0}
' F1.txt -

Was wondering how to do this properly in Awk. Thanks!


Solution

  • Please correct the assumptions if wrong.

    You have two files, one includes a set of entries. If the second file has any one of these words, replace them with first two chars.

    Example:

    ==> file1 <==
    Azerbaijan
    Belarus
    Canada
    
    ==> file2 <==
    Caspian sea is in Azerbaijan
    Belarus is in Europe
    Canada is in metric system.
    
    
    $ awk 'NR==FNR {a[$1]; next} 
                   {for(i=1;i<=NF;i++) 
                       if($i in a) $i=substr($i,1,2)}1' file1 file2
    
    Caspian sea is in Az
    Be is in Europe
    Ca is in metric system.
    

    note that substring index starts with 1 in awk.