sql postgresql validation pentaho pentaho-data-integration

Get valid date range. Minimum and maximum

I need to get the values of a table by validity range. Currently I make a group by the columns of the table and I make a min and max about the year; but this gives me the wrong result for the case of the example. I would need to solve it with sentence sql or with pentaho

Sample data:

year validity   Value1  Value2
2004               A     B
2006               A     C
2007               A     B
2008               A     B

SQL:

SELECT
    min(anio), max(anio), value1, value2
FROM tabla
GROUP BY
    value1, value2

Wrong result:

year min    year max    Value1  Value2
2004        2008           A      B
2006        2006           A      C

Expected result:

year min    year max    Value1  Value2
2004        2004         A        B
2007        2008         A        B
2006        2006         A        C

Please help on this issue.

Solution

You can combine the LAG() and SUM() window functions to separate the rows into groups according to your logic. Once separated, computing the min and max values is trivial.

For example:

select
  min(year_validity) as year_min,
  max(year_validity) as year_max,
  min(value1) as value1,
  min(value2) as value2
from (
  select
    *,
    sum(init) over(order by year_validity) as grp
  from (
    select *,
    case when not ( (value1, value2) = (
      lag(value1) over(order by year_validity),
      lag(value2) over(order by year_validity)
    ) ) then 1 else 0 end as init
    from tabla
  ) x
) y
group by grp
order by value1, value2

Result:

year_min  year_max  value1  value2
--------  --------  ------  ------
2004      2004      A       B
2007      2008      A       B
2006      2006      A       C

For the record, the data script for this case is:

create table tabla (
  year_validity int,
  value1 varchar(10),
  value2 varchar(10)
);

insert into tabla (year_validity, value1, value2) values (2004, 'A', 'B');
insert into tabla (year_validity, value1, value2) values (2006, 'A', 'C');
insert into tabla (year_validity, value1, value2) values (2007, 'A', 'B');
insert into tabla (year_validity, value1, value2) values (2008, 'A', 'B');