Search code examples
shellawkreplacelookup

Why do I get blanks on some rows when using shell script to replace in file with lookup in second file?


I have one file data.txt with content

2013-04-24;1;0.1635;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2013-04-24;1;0.9135;1.4135;fp.3.Luci_01A01.ctg.ctg7180000038386
2017-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2011-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2012-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085549
2016-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085549
2016-04-24;2;0.9135;1.4335;fp.3.Luci_02C06.ctg.ctg7180000085549
2013-04-24;1;0.9135;1.4135;fp.3.Luci_01A01.ctg.ctg7180000038386
2011-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546

and another file lookup.txt with content

1;2012-04-24;2ab1e4c0-de4d-11e2-a934-0f0479162b1b;fp.3.Luci_02C06.ctg.ctg7180000085546
7;2013-04-24;2ab21e90-de4d-11e2-9ce8-d368d9512bad;fp.3.Luci_01A01.ctg.ctg7180000038386
3;2014-04-24;2ab2582e-de4d-11e2-bb5f-6b1f6c4437f8;fp.3.Luci_02C06.ctg.ctg7180000085549

I want to replace column 5 in data.txt with column 1 in lookup.txt based on matching values in column 4 in lookup.txt. The result I want is

2013-04-24;1;0.1635;1.4135;1
2013-04-24;1;0.9135;1.4135;7
2017-04-24;2;0.9135;1.4135;1
2011-04-24;2;0.9135;1.4135;1
2012-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4335;3
2013-04-24;1;0.9135;1.4135;7
2011-04-24;2;0.9135;1.4135;1

I another post I found the following shell script which looks promising

awk -F';' 'NR==FNR{a[$4]=$1;next}{$5=a[$5]}1' lookup.txt data.txt

But the result below leaves for some reason column 5 blank for row 5,6,7 and 9. Why? The fact that we get space separators rather than ";" is also an issue but not as important

2013-04-24 1 0.1635 1.4135 1
2013-04-24 1 0.9135 1.4135 7
2017-04-24 2 0.9135 1.4135 1
2011-04-24 2 0.9135 1.4135 1
2012-04-24 2 0.9135 1.4135
2016-04-24 2 0.9135 1.4135
2016-04-24 2 0.9135 1.4335
2013-04-24 1 0.9135 1.4135 7
2011-04-24 2 0.9135 1.4135

Solution

  • You may use this awk solution:

    awk '
    BEGIN{FS=OFS=";"}
    {
       sub(/\r$/, "")
    } 
    NR == FNR {
       map[$NF] = $1
       next
    }
    {
       $NF = map[$NF]
    } 1' lookup.txt data.txt
    
    2013-04-24;1;0.1635;1.4135;1
    2013-04-24;1;0.9135;1.4135;7
    2017-04-24;2;0.9135;1.4135;1
    2011-04-24;2;0.9135;1.4135;1
    2012-04-24;2;0.9135;1.4135;3
    2016-04-24;2;0.9135;1.4135;3
    2016-04-24;2;0.9135;1.4335;3
    2013-04-24;1;0.9135;1.4135;7
    2011-04-24;2;0.9135;1.4135;1
    

    Explanation:

    • BEGIN{FS=OFS=";"}: Sets input and output field delimiters to ;
    • Use NF consistently instead of numbered column in both files
    • sub(/\r$/, "") is to remove any carriage characters from line end