I have one hundred files with three fields. Each one looks like this (with more lines) :
#time data1 data2
20 1.9864547484940e+01 -3.96363547484940e+01
40 2.164547484949e+01 -3.2363547477060e+01
60 1.9800047484940e+02 -4.06363547484940e+02
…
They are heavy and some of them take up to 1.5G. I would like to reduce their size by saving the two last columns with a lower double precision and deleting the e+0?
item. For example, I would like to convert the four lines above to :
#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.00 -406.36
…
I googled and came across the CONVFMT
option of awk
. But I don't know how to use it since I'm really not a pro of awk. Is this the right tool to use in my case ? If so, how should I use it ?
I also thought of writing a C++ script, but a direct command line would be great.
I would use awk's printf
function:
awk 'NR==1;NR>1{printf "%d %.3f %.3f\n", $1, $2, $3}' file
The above command outputs:
#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.000 -406.364
Short explanation:
NR==1
evaluates to true
if we are on the first line (NR == number of record). If a condition is not followed by an action (between {}
) awk simply prints the line, in this case the headers.
NR>1
evaluates to true
on all other lines except the first line of input. It is followed by an action, which uses printf
to achieve the desired result.