Search code examples
formattingfloating-pointseparatorgawk

Printing thousand separated floats with GAWK


I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000 should appear as 10.000 and 10000,01 as 10.000,01 in the output.

I (and Google) come up with this function, but this fails for floats:

function commas(n) {
  gsub(/,/,"",n)
  point = index(n,".") - 1
  if (point < 0) point = length(n)
    while (point > 3) {
      point -= 3
      n = substr(n,1,point)"."substr(n,point + 1)
    }
  sub(/-\./,"-",n)
  return d n
}

But it fails with floats.

Now I'm thinking of splitting the input to an integer and a < 1 part, then after formatting the integer gluing them again, but isn't there a better way to do it?

Disclaimer:

  • I'm not a programmer
  • I know that via some SHELL env. variables the thousand separators can be set, but it must be working in different environments with different lang and/or locale settings.
  • English is my 2nd language, sorry if I'm using it incorrectly

Solution

  • It fails with floats because you're passing in European type numbers (1.000.000,25 for a million and a quarter). The function you've given should work if you just change over commas and periods. I'd test the current version first with 1000000.25 to see if it works with non-European numbers.

    The following awk script can be called with "echo 1 | awk -f xx.gawk" and it will show you both the "normal" and European version in action. It outputs:

    123,456,789.1234
    123.456.789,1234
    

    Obviously, you're only interested in the functions, real-world code would use the input stream to pass values to the functions, not a fixed string.

    function commas(n) {
        gsub(/,/,"",n)
        point = index(n,".") - 1
        if (point < 0) point = length(n)
        while (point > 3) {
            point -= 3
            n = substr(n,1,point)","substr(n,point + 1)
        }
        return n
    }
    function commaseuro(n) {
        gsub(/\./,"",n)
        point = index(n,",") - 1
        if (point < 0) point = length(n)
        while (point > 3) {
            point -= 3
            n = substr(n,1,point)"."substr(n,point + 1)
        }
        return n
    }
    { print commas("1234,56789.1234") "\n" commaseuro("12.3456789,1234") }
    

    The functions are identical except in their handling of commas and periods. We'll call them separators and decimals in the following description:

    • gsub removes all of the existing separators since we'll be putting them back.
    • point finds where the decimal is since that's our starting point.
    • if there's no decimal, the if-statement starts at the end.
    • we loop while there's more than three characters left.
    • inside the loop, we adjust the position for inserting a separator, and insert it.
    • once the loop is finished, we return the adjusted value.