Search code examples
c#sqlstatisticsbinning

How to dynamically generate bands/groups of data with similar numbers in each?


I want to dynamically generate bands, that will then be grouped in reports.

My first thought was generate the bands by taking the minimum value and the maximum value and then dividing up the difference.

For instance suppose you had the salaries for a large group of people:

  • The lowest paid earns £12,000 a year and the highest earns £3,000,000
  • So I split that into 10 bands of similar size: (£3mill - £12k) / 10 = £298800
  • So my first band goes £12k to £310,800 and gets thousands of people in it
  • My second band goes £310k to £610k and has a few hundred
  • Every other band has a few people in each one

So then this isn't actually very useful. If I were to manually create the bands I'd want roughly similar numbers in each, something like: £12k-£14k, £14k-£18k, £18k-£25k, £25-£35k, ..., £1.5-£3million

This is just one example - there could be lots of different distributions.

I'm looking for an algorithm to generate the bands, so users would enter how many bands they want and the data would be grouped into that many bands with a similar number in each.

The banding needs to be quick - I can't just loop through the entire dataset.

The application is C# on top of SQL, but solutions from other languages welcome.


Solution

  • i think you are asking about how to query an existing dataset into the 'bands'...

    if this is true, then Oracle supports NTILE aggregate functions fo rthis purpose. There should be equivalents in other SQL implementations.