I need to group the rows of a query together to create bins (groups) of the rows. The range of the group should be dynamically passed to the query (e.g. from a variable @b = [4]).
I started with using row_number() function to get the rows numbered sequentially. But I have to idea how go on further. Here is my script:
select date, measure, row_number() over (order by date,measure)
from dwh.overview_all_data
And this is the result of my query:
date measure row_number
1998-01-11 AOX 1
1998-01-11 Ammonium 2
1998-01-11 Arsenic 3
1998-01-11 Atrazine 4
1998-01-11 Biochemical Oxygen 5
1998-01-11 Cadmium 6
1998-01-11 Calcium 7
1998-01-11 Cesium 8
1998-01-11 Chemical Oxygen Demand (Cr) 9
1998-01-11 Chemical Oxygen Demand (Mn) 10
1998-01-11 Chlorides 11
1998-01-11 Chromium 12
1998-01-11 Copper 13
1998-01-11 Dissolved oxygen 14
1998-01-11 Fecal coliforms 15
1998-01-11 Iron 16
let's say I want to have these 16 rows into 4 bins (b = [4]), I need also to define the start and end date of each bin (group). i.e. The result would looks like:
date measure row_number bin, startdate enddate
1998-01-11 AOX 1 1 1998-01-11 1998-01-11
1998-01-11 Ammonium 2 1 1998-01-11 1998-01-11
1998-01-11 Arsenic 3 1 1998-01-11 1998-01-11
1998-01-11 Atrazine 4 1 1998-01-11 1998-01-11
1998-01-11 Biochemical Oxygen 5 2 1998-01-11 1998-01-20
1998-01-15 Cadmium 6 2 1998-01-11 1998-01-20
1998-01-15 Calcium 7 2 1998-01-11 1998-01-20
1998-01-20 Cesium 8 2 1998-01-11 1998-01-20
1999-01-21 Chemical Oxygen Demand (Cr) 9 3 1999-01-21 2005-01-22
1999-01-22 Chemical Oxygen Demand (Mn) 10 3 1999-01-21 2005-01-22
1999-01-22 Chlorides 11 3 1999-01-21 2005-01-22
2005-01-22 Chromium 12 3 1999-01-21 2005-01-22
2005-02-01 Copper 13 4 2005-02-01 2007-04-01
2005-02-11 Dissolved oxygen 14 4 2005-02-01 2007-04-01
2005-03-15 Fecal coliforms 15 4 2005-02-01 2007-04-01
2007-04-01 Iron 16 4 2005-02-01 2007-04-01
I could figure it out:
Select t1.measure, t1.bin, Min(t1.date) startDate, Max(t1.date) EndDate, count(*)
from(
select date, measure, row_number() over (order by date,measure), floor(((row_number() over (order by date,measure))-1) /88) as bin
from dwh.overview_all_data
order by date, measure
) t1
group by t1.measure, t1.bin
order by t1.measure, t1.bin, startDate, EndDate
The number "88" in the bin column is calculated separately as following:
select 8819 / 100 -- = 88
The "100" is the number of bins (groups) that we want. It is given by the user.
The "8819" is the number of rows of the inner statement and It can be returned using this script:
select count(*) from
(
select date, measure, row_number() over (order by date, measure )
from dwh.overview_all_data
where date between '1998-01-01' and '2000-01-01'
order by date, measure
) t
by dividing these two numbers we get number of rows in each bin.