Search code examples
postgresqldategroupingwindow-functionspostgresql-9.5

Grouping rows of a query together


I need to group the rows of a query together to create bins (groups) of the rows. The range of the group should be dynamically passed to the query (e.g. from a variable @b = [4]).

I started with using row_number() function to get the rows numbered sequentially. But I have to idea how go on further. Here is my script:

select date, measure, row_number() over (order by date,measure)
from dwh.overview_all_data

And this is the result of my query:

date  measure  row_number
1998-01-11  AOX  1
1998-01-11  Ammonium  2
1998-01-11  Arsenic  3
1998-01-11  Atrazine  4
1998-01-11  Biochemical Oxygen  5
1998-01-11  Cadmium  6
1998-01-11  Calcium  7
1998-01-11  Cesium  8
1998-01-11  Chemical Oxygen Demand (Cr)  9
1998-01-11  Chemical Oxygen Demand (Mn)  10
1998-01-11  Chlorides  11
1998-01-11  Chromium  12
1998-01-11  Copper  13
1998-01-11  Dissolved oxygen  14
1998-01-11  Fecal coliforms  15
1998-01-11  Iron  16

let's say I want to have these 16 rows into 4 bins (b = [4]), I need also to define the start and end date of each bin (group). i.e. The result would looks like:

date  measure  row_number  bin, startdate  enddate
1998-01-11  AOX  1  1  1998-01-11  1998-01-11
1998-01-11  Ammonium  2  1  1998-01-11  1998-01-11
1998-01-11  Arsenic  3  1  1998-01-11  1998-01-11
1998-01-11  Atrazine  4  1  1998-01-11  1998-01-11
1998-01-11  Biochemical Oxygen  5  2  1998-01-11  1998-01-20
1998-01-15  Cadmium  6  2  1998-01-11  1998-01-20
1998-01-15  Calcium  7  2  1998-01-11  1998-01-20
1998-01-20  Cesium  8  2  1998-01-11  1998-01-20
1999-01-21  Chemical Oxygen Demand (Cr)  9  3  1999-01-21  2005-01-22
1999-01-22  Chemical Oxygen Demand (Mn)  10  3  1999-01-21  2005-01-22
1999-01-22  Chlorides  11  3  1999-01-21  2005-01-22
2005-01-22  Chromium  12  3  1999-01-21  2005-01-22
2005-02-01  Copper  13  4  2005-02-01  2007-04-01
2005-02-11  Dissolved oxygen  14  4  2005-02-01  2007-04-01
2005-03-15  Fecal coliforms  15  4  2005-02-01  2007-04-01
2007-04-01  Iron  16  4  2005-02-01  2007-04-01

Solution

  • I could figure it out:

    Select t1.measure, t1.bin, Min(t1.date) startDate, Max(t1.date) EndDate, count(*)
    from(
    select date, measure, row_number() over (order by date,measure), floor(((row_number() over (order by date,measure))-1) /88) as bin
    from dwh.overview_all_data
      order by date, measure
    ) t1
    group by t1.measure, t1.bin
    order by t1.measure, t1.bin, startDate, EndDate
    

    The number "88" in the bin column is calculated separately as following:

    select 8819 / 100 -- = 88
    

    The "100" is the number of bins (groups) that we want. It is given by the user.

    The "8819" is the number of rows of the inner statement and It can be returned using this script:

    select count(*) from
        (
          select date, measure, row_number() over (order by date, measure )
          from dwh.overview_all_data
          where date between '1998-01-01' and '2000-01-01'
          order by date, measure
        ) t
    

    by dividing these two numbers we get number of rows in each bin.