Search code examples
linuxawk

division and logarithmic calculation in AWK


I have three column file like this below. I want to divide column 3 by column 2 (ignoring headers) and print it in column 4. Also, I want to calculate the log2 value of column4 and print it in column5 as shown below.

head my_file.txt
this    is header   
chrX:73829232:-::chrX:73831065:-    76.5382 76.34220209
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887549
chr11:18266979:+::chr11:18269194:+  15.4004 1558.282058

I am trying by awk, is giving less output and repeated lines.

awk -v OFS='\t' 'FNR > 1 {$4 = $3 / $2}1' my_file.txt |awk -F"\t" 'FNR > 1{a = log($4)/log(2); print $0"\t" a} OFS="\t"'
awk: cmd. line:1: (FILENAME=my_file.txt FNR=15) fatal: division by zero attempted
this is header
chrX:73829232:-::chrX:73831065:-    76.5382 76.3422020852288    0.997439    -0.00369948
chrX:73829232:-::chrX:73831065:-    76.5382 76.3422020852288    0.997439
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887548960591    1.03527 0.0500071
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887548960591    1.03527

This is my desired output.

this is my desired header
chrX:73829232:-::chrX:73831065:-    76.5382 76.34220209 0.9974392145    -0.003699170995
chrX:73827985:-::chrX:73829067:-    60.0702 62.1887549  1.035267985 0.05000426549
chr11:18266979:+::chr11:18269194:+  15.4004 1558.282058 101.1845185 6.66084476

Solution

  • You can try the below command:

    awk -v OFS='\t' 'FNR==1 {print $0, "col4", "col5"; next} {if ($2 == 0) {$4 = "NaN"; $5 = "NaN"} else {$4 = $3 / $2; $5 = log($4) / log(2)}} 1' my_file.txt
    

    Basically, the above command:

    • checks for the first row (header) and adds "col4" and "col5" as column names.
    • For the remaining rows, it calculates the division and log2 values if column 2 is not zero; otherwise, it sets "NaN" as the value for columns 4 and 5. By setting the values of columns 4 and 5 to "NaN" when column 2 is zero, the script avoids division by zero errors and provides a clear indication that the result of the calculation is not a valid number.

    CODE DEMO