Search code examples
postgresqlpostgresql-9.3window-functions

Using window functions to reduce amount of lines implementing data ranges


I want to reduce the amount of lines in a table, which has a lot of duplicated data. My first thought was to use some window function to define the date ranges to save in the table, so that whenever I need this info, the date ranges are just delimiters in the join condition. But then I noticed that some references were overlapping, so, I'm not sure which would be the best way to do it.

I'm using Postgres 9.3.

select  distinct    
    min(obs_date) over (partition by equipment, temperature) as beg_obs_date,
    max(obs_date) over (partition by equipment, temperature) as end_obs_date,
    equipment, 
    temperature
from    
(   select  generate_series('2016-05-01', '2016-05-08', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    union all   
    select  generate_series('2016-05-09', '2016-05-15', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -20.00::real as temperature
    union all

    select  generate_series('2016-05-16', '2016-06-10', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
) sq

I get:

beg_obs_date    end_obs_date    equipment   temperature
2016-05-01      2016-06-10      FREEZER_1   -15,2
2016-05-09      2016-05-15      FREEZER_1   -20

What I want is:

beg_obs_date    end_obs_date    equipment   temperature
2016-05-01      2016-05-08      FREEZER_1   -15,2
2016-05-09      2016-05-15      FREEZER_1   -20
2016-05-16      2016-06-10      FREEZER_1   -15,2

Any idea?

Thank you!


Solution

  • Use row_number() to distinguish consecutive series. The data (slightly simplified) with packet added:

    with the_data as (
        select  generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -15.20::real as temperature
        union all   
        select  generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -20.00::real as temperature
        union all
        select  generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -15.20::real as temperature
        )
    select 
        *, 
        row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
    from the_data
    
      obs_date  | equipment | temperature | packet 
    ------------+-----------+-------------+--------
     2016-05-01 | FREEZER_1 |       -15.2 |      0
     2016-05-02 | FREEZER_1 |       -15.2 |      0
     2016-05-03 | FREEZER_1 |       -15.2 |      0
     2016-05-04 | FREEZER_1 |         -20 |     -3
     2016-05-05 | FREEZER_1 |         -20 |     -3
     2016-05-06 | FREEZER_1 |       -15.2 |     -2
     2016-05-07 | FREEZER_1 |       -15.2 |     -2
     2016-05-08 | FREEZER_1 |       -15.2 |     -2
    (8 rows)
    

    In max() and min() use packet instead of temperature:

    with the_data as (
        select  generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -15.20::real as temperature
        union all   
        select  generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -20.00::real as temperature
        union all
        select  generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
            'FREEZER_1'::varchar as equipment,
            -15.20::real as temperature
        )
    select distinct    
        min(obs_date) over (partition by equipment, packet) as beg_obs_date,
        max(obs_date) over (partition by equipment, packet) as end_obs_date,
        equipment, 
        temperature
    from (
        select 
            *, 
            row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
        from the_data
    ) s
    order by 1;
    
     beg_obs_date | end_obs_date | equipment | temperature 
    --------------+--------------+-----------+-------------
     2016-05-01   | 2016-05-03   | FREEZER_1 |       -15.2
     2016-05-04   | 2016-05-05   | FREEZER_1 |         -20
     2016-05-06   | 2016-05-08   | FREEZER_1 |       -15.2
    (3 rows)