Search code examples
sqlhistogramlogarithm

How do I make a logarithmic histogram in SQL?


My table:

val
1
2
3
4
5
6
10
15

Desired results:

bin | qty 
1   | 1
2   | 2
4   | 3
8   | 3

Which means there are, with inclusive/exclusive ranges,

  • 1 value between 1-2,
  • 2 values between 2-4,
  • 3 values between 4-8,
  • 3 values between 8-16.

Solution

  • Your bin size, in this case, in logarithmic terms, is 2.

    If you wish to use another bin size, substitute the 2s in the script below.

    select 
        pow(2, floor(ln(val) / ln(2))) as bin,
        count(bin) as qty
    from 
        mytable
    group by
        bin;
    

    Explanation

    First, we take the log of your values in base 2. log(val, 2) might work in some RDBMS, but if not, just remember the log property that log(val, 2) = ln(val) / ln(2).

    val | ln(val) / ln(2)
    1   | 0
    2   | 1
    3   | 1.58496250072
    4   | 2
    5   | 2.32192809489
    

    Then we floor this:

    val | floor(ln(val) / ln(2))
    1   | 0
    2   | 1
    3   | 1
    4   | 2
    5   | 2
    

    Finally, we use the power of two to transform these floored values into the logarithmic bin values.

    val | pow(2, floor(ln(val) / ln(2)))
    1   | 1
    2   | 2
    3   | 2
    4   | 4
    5   | 4
    

    The rest is simply grouping by the logarithmic bins and counting.

    Caveats

    No pow

    If your RDBMS does not support pow(x, y), you can use exp(y * ln(x)). The expression then becomes:

        exp(floor(ln(val) / ln(2)) * ln(2))
    

    Zero

    log(0) is undefined. In the RDBMS I tested, it returns null.

    If your table has values of 0, you most likely want to bin them between 0 and 1. To do that, you can wrap the whole expression with ifnull(..., 0), like so:

        ifnull(pow(2, floor(ln(val) / ln(2))), 0)
    

    Negative

    The log of negative numbers is undefined... But you probably want to bin them like [0 to -1), [-1 to -2), [-2 to -4), [-4 to -8), etc.

    If your database has negative values, you can achieve that binning by first using abs in your value, then in the end restoring its original signal by multiplying the result by val/abs(val). Your expression then becomes:

    pow(2, floor(ln(abs(val)) / ln(2))) * val/abs(val)
    

    Negative and zero

    If your database has both negative and zero values, you should wrap the ifnull around everything else. Otherwise, the val/abs(val) part would make you divide by zero, re-introducing nulls.

    ifnull(pow(2, floor(ln(abs(val)) / ln(2))) * val/abs(val), 0)