Search code examples
bashshellawkcarriage-return

Adding constant value as column at the end of file using awk


I want to add a column with a constant value at the end of each line of a file in bash, whilst selecting columns, doing a mathematical operation, and changing the field separator (from what I think is just tab) to space.

My input file:

10:100968448:T:AA       0.3519  10      100968448       t       aa      1.0024  0.01    0.812
10:101574552:A:ATG      0.4493  10      101574552       a       atg     0.98906 0.0097  0.2585
10:102244152:A:AG       0.2008  10      102244152       a       ag      0.996705        0.0114  0.7701
10:102290698:A:AG       0.1899  10      102290698       a       ag      0.993024        0.0114  0.5431
10:104999458:T:TG       0.3449  10      104999458       t       tg      0.956763        0.0101  1.149e-05

If I throw the constant at the second to last column:

awk -v OFS=" " 'BEGIN { FS = "\t" } ;  {print $1, $5, $6, log($7)/log(10), '105318', $9}' input

It works:

10:100968448:T:AA t aa 0.00104106 105318 0.812
10:101574552:A:ATG a atg -0.00477736 105318 0.2585
10:102244152:A:AG a ag -0.00143336 105318 0.7701
10:102290698:A:AG a ag -0.00304026 105318 0.5431
10:104999458:T:TG t tg -0.0191956 105318 1.149e-05

But when I try putting the constant at the end of the file, as I need it:

awk -v OFS=" " 'BEGIN { FS = "\t" } ;  {print $1, $5, $6, log($7)/log(10), $9, '105318'}' input

It doesn't really work (it's adding the constant to the first field):

 10531868448:T:AA t aa 0.00104106 0.812
 10531874552:A:ATG a atg -0.00477736 0.2585
 10531844152:A:AG a ag -0.00143336 0.7701
 10531890698:A:AG a ag -0.00304026 0.5431
 10531899458:T:TG t tg -0.0191956 1.149e-05

I even tried using the file where it works, shuffling the columns, and the constant is added somewhere random... I have used dos2unix on this file, thinking maybe there's some weird character in it, but the problem remains the same. When I use comma as the output field separator, I see that the multiple commas are generated at the end of the file (when I try to include the constant as the last column).

For clarification, desired output:

10:100968448:T:AA t aa 0.00104106 0.812 105318 
10:101574552:A:ATG a atg -0.00477736 0.2585 105318 
10:102244152:A:AG a ag -0.00143336 0.7701 105318 
10:102290698:A:AG a ag -0.00304026 0.5431 105318 
10:104999458:T:TG t tg -0.0191956 1.149e-05 105318 

Any ideas?


Solution

  • Could you please try following.

    awk '{print $1,$5,$6,log($7)/log(10),$NF,105318}' Input_file
    

    In case you have control M characters as per Kamil's answer then run following.

    awk '{gsub(/\r/,"");print $1,$5,$6,log($7)/log(10),$NF,105318}' Input_file