Search code examples
linuxawkdoubleprecisiondata-files

Change double precision of data file


I have one hundred files with three fields. Each one looks like this (with more lines) :

#time data1 data2
20 1.9864547484940e+01 -3.96363547484940e+01
40 2.164547484949e+01 -3.2363547477060e+01 
60 1.9800047484940e+02 -4.06363547484940e+02
…

They are heavy and some of them take up to 1.5G. I would like to reduce their size by saving the two last columns with a lower double precision and deleting the e+0? item. For example, I would like to convert the four lines above to :

#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.00 -406.36
…

I googled and came across the CONVFMT option of awk. But I don't know how to use it since I'm really not a pro of awk. Is this the right tool to use in my case ? If so, how should I use it ?

I also thought of writing a C++ script, but a direct command line would be great.


Solution

  • I would use awk's printf function:

    awk 'NR==1;NR>1{printf "%d %.3f %.3f\n", $1, $2, $3}' file
    

    The above command outputs:

    #time data1 data2
    20 19.865 -39.636
    40 21.645 -32.364
    60 198.000 -406.364
    

    Short explanation:

    NR==1 evaluates to true if we are on the first line (NR == number of record). If a condition is not followed by an action (between {}) awk simply prints the line, in this case the headers.

    NR>1 evaluates to true on all other lines except the first line of input. It is followed by an action, which uses printf to achieve the desired result.