Search code examples
bashunixdataframeawkgrouping

Create bins with awk histogram-like


Here's my input file :

1.37987
1.21448
0.624999
1.28966
1.77084
1.088
1.41667

I would like to create bins of a size of my choice to get histogram-like output, e.g. something like this for 0.1 bins, starting from 0 :

0 0.1 0
...
0.5 0.6 0
0.6 0.7 1
...
1.0 1.1 1
1.1 1.2 0
1.2 1.3 2
1.3 1.4 1
...

My file is too big for R, so I'm looking for an awk solution (also open to anything else that I can understand, as I'm still a Linux beginner).

This was sort of already answered in this post : awk histogram in buckets but the solution is not working for me.


Solution

  • This is also possible :

    awk -v size=0.1 
      '{ b=int($1/size); a[b]++; bmax=b>bmax?b:bmax; bmin=b<bmin?b:bmin }
       END { for(i=bmin;i<=bmax;++i) print i*size,(i+1)*size,a[i] }' <file>
    

    It essentially does the same as the solution of EdMorton, but starts printing buckets from the minimum value which is default 0. It essentially takes negative numbers into account.