I want to reduce the amount of lines in a table, which has a lot of duplicated data. My first thought was to use some window function to define the date ranges to save in the table, so that whenever I need this info, the date ranges are just delimiters in the join condition. But then I noticed that some references were overlapping, so, I'm not sure which would be the best way to do it.
I'm using Postgres 9.3.
select distinct
min(obs_date) over (partition by equipment, temperature) as beg_obs_date,
max(obs_date) over (partition by equipment, temperature) as end_obs_date,
equipment,
temperature
from
( select generate_series('2016-05-01', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-09', '2016-05-15', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-16', '2016-06-10', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
) sq
I get:
beg_obs_date end_obs_date equipment temperature
2016-05-01 2016-06-10 FREEZER_1 -15,2
2016-05-09 2016-05-15 FREEZER_1 -20
What I want is:
beg_obs_date end_obs_date equipment temperature
2016-05-01 2016-05-08 FREEZER_1 -15,2
2016-05-09 2016-05-15 FREEZER_1 -20
2016-05-16 2016-06-10 FREEZER_1 -15,2
Any idea?
Thank you!
Use row_number()
to distinguish consecutive series. The data (slightly simplified) with packet added:
with the_data as (
select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
)
select
*,
row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
from the_data
obs_date | equipment | temperature | packet
------------+-----------+-------------+--------
2016-05-01 | FREEZER_1 | -15.2 | 0
2016-05-02 | FREEZER_1 | -15.2 | 0
2016-05-03 | FREEZER_1 | -15.2 | 0
2016-05-04 | FREEZER_1 | -20 | -3
2016-05-05 | FREEZER_1 | -20 | -3
2016-05-06 | FREEZER_1 | -15.2 | -2
2016-05-07 | FREEZER_1 | -15.2 | -2
2016-05-08 | FREEZER_1 | -15.2 | -2
(8 rows)
In max()
and min()
use packet
instead of temperature
:
with the_data as (
select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
)
select distinct
min(obs_date) over (partition by equipment, packet) as beg_obs_date,
max(obs_date) over (partition by equipment, packet) as end_obs_date,
equipment,
temperature
from (
select
*,
row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
from the_data
) s
order by 1;
beg_obs_date | end_obs_date | equipment | temperature
--------------+--------------+-----------+-------------
2016-05-01 | 2016-05-03 | FREEZER_1 | -15.2
2016-05-04 | 2016-05-05 | FREEZER_1 | -20
2016-05-06 | 2016-05-08 | FREEZER_1 | -15.2
(3 rows)