Search code examples
sqldb2

Dividing Overlapping Periods in SQL


I have this table in SQL (called "df"):

         v1         v2   v3         v4 name
 2015-06-23 2024-06-09 2013 2015-03-31  red
 2015-06-23 2024-06-09 2014 2015-03-31  red
 2015-06-23 2024-06-09 2018 2019-03-18  red
 2015-06-23 2024-06-09 2020 2021-02-21  red
 2015-06-23 2024-06-09 2023 2024-03-15  red
 2015-06-23 2024-06-09 2013 2015-03-31 blue
 2015-06-23 2024-06-09 2014 2015-03-31 blue
 2015-06-23 2024-06-09 2018 2019-03-18 blue


CREATE TABLE df (
    v1 DATE,
    v2 DATE,
    v3 INT,
    v4 DATE,
    name VARCHAR(10)
);

INSERT INTO df (v1, v2, v3, v4, name) VALUES
('2015-06-23', '2024-06-09', 2013, '2015-03-31', 'red'),
('2015-06-23', '2024-06-09', 2014, '2015-03-31', 'red'),
('2015-06-23', '2024-06-09', 2018, '2019-03-18', 'red'),
('2015-06-23', '2024-06-09', 2020, '2021-02-21', 'red'),
('2015-06-23', '2024-06-09', 2023, '2024-03-15', 'red'),
('2015-06-23', '2024-06-09', 2013, '2015-03-31', 'blue'),
('2015-06-23', '2024-06-09', 2014, '2015-03-31', 'blue'),
('2015-06-23', '2024-06-09', 2018, '2019-03-18', 'blue');

For each color, I am trying to perform the following calculation:

  • Step 1: How many years are there between : [max(V2) - min(V1)] + 1?
  • Step 2: What values of V3 are between : Year[max(V2) - 1] to Year[min(V1) - 1]
  • Step 3: How many values of V3 are in the answer of Step 2?
  • Step 4: Division : Step 3/ Step 1

As an example, in the case of 'red':

  • Step 1: 10 Years : 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024
  • Step 2: Years: 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
  • Step 3: Number of Years = 4 (2014, 2018,2020, 2023)
  • Step 4: 4/10 = 0.4

I tried to write the SQL code for this problem using a separate CTE for each step:

WITH step_1 AS (
    SELECT
        name,
        YEAR(MAX(v2)) - YEAR(MIN(v1)) + 1 AS YearRange
    FROM df
    GROUP BY name
),
step_2_part1 AS (
    SELECT
        name,
        YEAR(MAX(v2)) - 1 AS MaxYear,
        YEAR(MIN(v1)) - 1 AS MinYear
    FROM df
    GROUP BY name
),
step_2_part2 AS (
    SELECT
        d.name,
        d.v3
    FROM df d
    JOIN step_2_part1 s2p1 ON d.name = s2p1.name
    WHERE d.v3 BETWEEN s2p1.MinYear AND s2p1.MaxYear
),
step_3 AS (
    SELECT
        name,
        COUNT(v3) AS V3Count
    FROM step_2_part2
    GROUP BY name
),
step_4 AS (
    SELECT
        s1.name,
        CASE
            WHEN s1.YearRange = 0 THEN 'Error: denominator is 0'
            ELSE CAST(s3.V3Count AS FLOAT) / s1.YearRange
        END AS Result
    FROM step_1 s1
    JOIN step_3 s3 ON s1.name = s3.name
)
SELECT * FROM step_4;

My Question: I am not sure if I am doing Step 3 correctly. I am not sure if I need to add an extra CTE and use a sub-select/subquery along with the IN statement to correctly identify this range of years.

Can someone please help me do this correctly?

Thanks!


Solution

  • try:

    WITH step_1 AS (
        SELECT
            name,
            YEAR(MAX(v2)) - YEAR(MIN(v1)) + 1 AS YearRange
        FROM df
        GROUP BY name
    ),
    step_2_part1 AS (
        SELECT
            name,
            YEAR(MAX(v2)) - 1 AS MaxYear,
            YEAR(MIN(v1)) - 1 AS MinYear
        FROM df
        GROUP BY name
    ),
    step_2_part2 AS (
        SELECT
            d.name,
            d.v3
        FROM df d
        JOIN step_2_part1 s2p1 ON d.name = s2p1.name
        WHERE d.v3 BETWEEN s2p1.MinYear AND s2p1.MaxYear
    ),
    step_3 AS (
        SELECT
            name,
            COUNT(v3) AS V3Count
        FROM step_2_part2
        GROUP BY name
    ),
    step_4 AS (
        SELECT
            s1.name,
            CASE
                WHEN s1.YearRange = 0 THEN 'Error: denominator is 0'
                ELSE CAST(s3.V3Count AS FLOAT) / s1.YearRange
            END AS Result
        FROM step_1 s1
        JOIN step_3 s3 ON s1.name = s3.name
    )
    SELECT * FROM step_4;