linux performance awk decimal processing-efficiency

Linux - more efficient way to calculate price values within a file

I have a rather large file (over 500,000 lines and about 2000 characters per line) that I need to parse through and total up the values in a specific field.

I'm currently using the following, but it's taking about 20 minutes to run:

grep ^D $INPUTFILE | cut -c 374-383 | paste -sd+|bc -l

More information on my file: - it's a pipe delimited file - the values may contain decimal values

I'm using RHEL. (anyone have any ideas on a more efficient way of doing this)

SAMPLE LINE: (1 of many)

D|||W239337273 ||||APPLE ||ORANGES |||||||||||||||||||||00172572880|||||||||||
| | | | | |||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||
|||||||||||||| 0.89| 1.95|| 1.95|||||||||||||||||||||| 0 .00||1 ||| 0.00||||||
|| 2.33|| 145.32|||||||| 0.83|||||||||||| 2.78| 37.99|||||||||||||||||||||||||
||||||||||||RAVINDERSINGH13 ||DAVID RANKIN | |19230129|01 |||STACK||2|||
|OVERFLOW ||||||||||||2076671331|046403139|1| |00||004336|||||||||||| ||||
|FISH |||||||2007878|||||COCONUTS |||CRK 0047 |||||||193264243231187998 ||||||
||30 ||| ||20191122||||1258338132 |C|||N||||||||||||||1 |||||||||6326812132 ||
|||0 | |0|||B2|||||||| |20191122|||||||||||BS1ACG962 |||THOR |CAPTAIN AMERICA
|||AVENGERS||1306885298 |||||||||||||||||||||||| |||||||||||||||||||||||||||||
|||||||||||||||||||||||||||||||||||||||||||||

Sorry, the above sample was modified for security and my column values were altered by mistake. For testing purposes based on comments below, I created a sample file with 5 entries (duplicates of the sample line above) and ran the following 2 commands successfully. Going to try to run it for the full file to check performance.

sh-4.2$ awk -F'|' '/^D/{sum += $166} END {print sum}' sample.txt

11.65

sh-4.2$ grep ^D sample.txt | cut -c407-416 | paste -sd+|bc -l

11.65

sh-4.2$ grep ^D full_file.txt | cut -c 374-383 | paste -sd+|bc -l
2316826.10

sh-4.2$ awk -F'|' '/^D/{sum += $153} END {print sum}' full_file.txt
2.31683e+06

I double checked the fields to make sure I was using the same fields in both commands:

sh-4.2$ cut -c 374-383 full_file.txt > test1

sh-4.2$ awk -F'|' '{print $153}' full_file.txt > test2

sh-4.2$ diff test1 test2

Solution

Could you please try following, based on OP's attempts and comments.

awk 'BEGIN{FS="|"}/^D/{sum+=$166} END{printf("%6.2f\n",sum)}'  Input_file