I have this table in SQL (called "df"):
v1 v2 v3 v4 name
2015-06-23 2024-06-09 2013 2015-03-31 red
2015-06-23 2024-06-09 2014 2015-03-31 red
2015-06-23 2024-06-09 2018 2019-03-18 red
2015-06-23 2024-06-09 2020 2021-02-21 red
2015-06-23 2024-06-09 2023 2024-03-15 red
2015-06-23 2024-06-09 2013 2015-03-31 blue
2015-06-23 2024-06-09 2014 2015-03-31 blue
2015-06-23 2024-06-09 2018 2019-03-18 blue
CREATE TABLE df (
v1 DATE,
v2 DATE,
v3 INT,
v4 DATE,
name VARCHAR(10)
);
INSERT INTO df (v1, v2, v3, v4, name) VALUES
('2015-06-23', '2024-06-09', 2013, '2015-03-31', 'red'),
('2015-06-23', '2024-06-09', 2014, '2015-03-31', 'red'),
('2015-06-23', '2024-06-09', 2018, '2019-03-18', 'red'),
('2015-06-23', '2024-06-09', 2020, '2021-02-21', 'red'),
('2015-06-23', '2024-06-09', 2023, '2024-03-15', 'red'),
('2015-06-23', '2024-06-09', 2013, '2015-03-31', 'blue'),
('2015-06-23', '2024-06-09', 2014, '2015-03-31', 'blue'),
('2015-06-23', '2024-06-09', 2018, '2019-03-18', 'blue');
For each color, I am trying to perform the following calculation:
As an example, in the case of 'red':
I tried to write the SQL code for this problem using a separate CTE for each step:
WITH step_1 AS (
SELECT
name,
YEAR(MAX(v2)) - YEAR(MIN(v1)) + 1 AS YearRange
FROM df
GROUP BY name
),
step_2_part1 AS (
SELECT
name,
YEAR(MAX(v2)) - 1 AS MaxYear,
YEAR(MIN(v1)) - 1 AS MinYear
FROM df
GROUP BY name
),
step_2_part2 AS (
SELECT
d.name,
d.v3
FROM df d
JOIN step_2_part1 s2p1 ON d.name = s2p1.name
WHERE d.v3 BETWEEN s2p1.MinYear AND s2p1.MaxYear
),
step_3 AS (
SELECT
name,
COUNT(v3) AS V3Count
FROM step_2_part2
GROUP BY name
),
step_4 AS (
SELECT
s1.name,
CASE
WHEN s1.YearRange = 0 THEN 'Error: denominator is 0'
ELSE CAST(s3.V3Count AS FLOAT) / s1.YearRange
END AS Result
FROM step_1 s1
JOIN step_3 s3 ON s1.name = s3.name
)
SELECT * FROM step_4;
My Question: I am not sure if I am doing Step 3 correctly. I am not sure if I need to add an extra CTE and use a sub-select/subquery along with the IN statement to correctly identify this range of years.
Can someone please help me do this correctly?
Thanks!
try:
WITH step_1 AS (
SELECT
name,
YEAR(MAX(v2)) - YEAR(MIN(v1)) + 1 AS YearRange
FROM df
GROUP BY name
),
step_2_part1 AS (
SELECT
name,
YEAR(MAX(v2)) - 1 AS MaxYear,
YEAR(MIN(v1)) - 1 AS MinYear
FROM df
GROUP BY name
),
step_2_part2 AS (
SELECT
d.name,
d.v3
FROM df d
JOIN step_2_part1 s2p1 ON d.name = s2p1.name
WHERE d.v3 BETWEEN s2p1.MinYear AND s2p1.MaxYear
),
step_3 AS (
SELECT
name,
COUNT(v3) AS V3Count
FROM step_2_part2
GROUP BY name
),
step_4 AS (
SELECT
s1.name,
CASE
WHEN s1.YearRange = 0 THEN 'Error: denominator is 0'
ELSE CAST(s3.V3Count AS FLOAT) / s1.YearRange
END AS Result
FROM step_1 s1
JOIN step_3 s3 ON s1.name = s3.name
)
SELECT * FROM step_4;