I am building a table for a data warehouse that needs to have a row for each change that occurs. The issue is that there are sometimes changes that occur in the subgroups and I can't figure out how to show those changes.
For example, I have the following table:
RowNumber | Code | CorrectedProductYear | ProductYear | Product | CategoryYear | Category | PartYear | Parts | KeepRow |
---|---|---|---|---|---|---|---|---|---|
1 | ABC | 2012 | 2012 | Sport Car | 1995 | Auto | 1980 | Wheels | N |
2 | ABC | 2012 | 2012 | Sport Car | 1996 | Cars | 1980 | Wheels | N |
3 | ABC | 2012 | 2012 | Sport Car | 1998 | Transport | 1980 | Wheels | Y |
4 | ABC | 2014 | 2014 | Sedan | 1995 | Auto | 1980 | Wheels | N |
5 | ABC | 2014 | 2014 | Sedan | 1996 | Cars | 1980 | Wheels | N |
6 | ABC | 2014 | 2014 | Sedan | 1998 | Transport | 1980 | Wheels | Y |
7 | ABC | 2018 | 2018 | Luxury Car | 1995 | Auto | 1980 | Wheels | N |
8 | ABC | 2018 | 2018 | Luxury Car | 1996 | Cars | 1980 | Wheels | N |
9 | ABC | 2018 | 2018 | Luxury Car | 1998 | Transport | 1980 | Wheels | N |
10 | ABC | 2016 | 2018 | Sedan | 2016 | Vehicle | 1980 | Wheels | Y |
11 | ABC | 2018 | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | Y |
12 | ABC | 2018 | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | N |
13 | ABC | 2019 | 2018 | Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
14 | ABC | 2020 | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | N |
15 | ABC | 2020 | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
I marked the KeepRow with a 'Y' to show what the find result should look like. It should be this:
RowNumber | Code | CorrectedProductYear | Product | CategoryYear | Category | PartYear | Parts | KeepRow |
---|---|---|---|---|---|---|---|---|
3 | ABC | 2012 | Sport Car | 1998 | Transport | 1980 | Wheels | Y |
6 | ABC | 2014 | Sedan | 1998 | Transport | 1980 | Wheels | Y |
10 | ABC | 2016 | Sedan | 2016 | Vehicle | 1980 | Wheels | Y |
11 | ABC | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | Y |
13 | ABC | 2019 | Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
15 | ABC | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
In this table the ProductYear is the "main" year. When the subgroupings (Category and Parts) have a change in their description/year, that needs to be captured in the ProductYear. This occurs in:
I know that I can do an Max Effective Date query to only pull the most recent CategoryYear and PartYear, but then I miss the changes that might occur between the ProductYear.
I've tried different variations of LAG and LEAD and well as LAST_VALUE and some of the other Window functions, but I'm at stuck and can't figure this one out.
I also think that this might not be able to be completed with just SQL. I do have the ability to do some SSIS if that is a better way to proceed.
Any help would be enormously appreciated!
Thank you.
I think this is something like what you are looking for. I had to split it into two queries as you are filtering once for CategoryYear and then again for PartYear. This might help you get on the right path:
SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts
FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN,
LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear,
LEAD(ProductYear,1) OVER(order by ProductYear) NextProdYear,
LAG(Category,1) OVER(Order by Product,ProductYear) LastCategory
FROM Products
) T
WHERE CategoryYear > LastProdYear and CategoryYear < NextProdYear
and KeepRow = 'Y' and Category <> LastCategory and RN > 1
UNION ALL
SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts
FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN,
LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear,
LEAD(ProductYear,1) OVER(order by ProductYear) NextProdYear,
LAG(Parts,1) OVER(Order by Product,ProductYear) LastParts
FROM Products
) T
WHERE PartYear > LastProdYear and PartYear < NextProdYear
and KeepRow = 'Y' and Parts <> LastParts and RN > 1