I am using SQL Server 2008 and am trying to calculate median values per date range.
Example: There are 4 dates with multiple values at each date, (1/1/16, 3/1/16, 7/1/16, 10/1/16), the median value for date 10/1/16 would be calculated from the values in the date range 1/1/16 – 7/1/16. The median value for date 7/1/16 would be calculated from the values in the date range 1/1/16 – 3/1/16.
If the date is 10/1/16, 10/1/16 values should not be included in the median (This true for all dates. Also, future dates should not be included in the calculation).
The query below calculates the median value for the MAX Visit Date. However, I need it to calculate the median for the other 3 Visit Dates as well. I have tried removing the MAX CTE and adding a join for all visit dates < than visit date but I could not get it to work. I have been unsuccessful in writing this so far, so any help would be great. I have included sample data and my expected results below.
EDIT: Would some kind of recursion possibly work?
;CREATE TABLE #TEST(QUESTION VARCHAR(15), VISIT_DATE DATE, VALUE INT)
;INSERT #TEST(QUESTION, VISIT_DATE, VALUE)
VALUES
('ABC', '1/1/2016', '80'),
('ABC', '1/1/2016', '90'),
('ABC', '1/1/2016', '100'),
('ABC', '3/1/2016', '70'),
('ABC', '3/1/2016', '80'),
('ABC', '3/1/2016', '90'),
('ABC', '3/1/2016', '100'),
('ABC', '7/1/2016', '50'),
('ABC', '7/1/2016', '60'),
('ABC', '7/1/2016', '70'),
('ABC', '10/1/2016', '10'),
('ABC', '10/1/2016', '20'),
('ABC', '10/1/2016', '30'),
('ABC', '10/1/2016', '40')
;WITH MAX_VISITDATE AS (
SELECT MAX(VISIT_DATE) AS MAX_VISITDATE
FROM #TEST
), MEDIAN AS (
SELECT RN.Question, AVG(RN.VALUE) AS GroupMedianPastQtrs
FROM
( SELECT QUESTION, VALUE, ROW_NUMBER() OVER (PARTITION BY QUESTION ORDER BY VALUE) AS ROWNUMBER, COUNT(*) OVER (PARTITION BY Question) AS QuestionCount
FROM #TEST T
WHERE VISIT_DATE NOT IN (SELECT MAX_VISITDATE FROM MAX_VISITDATE)
) RN
WHERE RN.ROWNUMBER IN (RN.QuestionCount/2+1, (RN.QuestionCount+1)/2)
GROUP BY RN.Question
)
SELECT *
FROM #TEST T
INNER JOIN MEDIAN ON T.Question = MEDIAN.Question
--Expected Results:
Question|Visit_DAte |Value|GroupMedian |
--------|-----------|-----|-------------|
'ABC' |'1/1/2016' |'80' |'' |--No Median, no previous values
'ABC' |'1/1/2016' |'90' |'' |--No Median, no previous values
'ABC' |'1/1/2016' |'100'|'' |--No Median, no previous values
'ABC' |'3/1/2016' |'70' |'90' |--Median value from date 1/1/16
'ABC' |'3/1/2016' |'80' |'90' |--Median value from date 1/1/16
'ABC' |'3/1/2016' |'90' |'90' |--Median value from date 1/1/16
'ABC' |'3/1/2016' |'100'|'90' |--Median value from date 1/1/16
'ABC' |'7/1/2016' |'50' |'90' |--Median value from date range 1/1/16 to 3/1/16
'ABC' |'7/1/2016' |'60' |'90' |--Median value from date range 1/1/16 to 3/1/16
'ABC' |'7/1/2016' |'70' |'90' |--Median value from date range 1/1/16 to 3/1/16
'ABC' |'10/1/2016'|'10' |'80' |--Median value from date range 1/1/16 to 7/1/16
'ABC' |'10/1/2016'|'20' |'80' |--Median value from date range 1/1/16 to 7/1/16
'ABC' |'10/1/2016'|'30' |'80' |--Median value from date range 1/1/16 to 7/1/16
'ABC' |'10/1/2016'|'40' |'80' |--Median value from date range 1/1/16 to 7/1/16
I don't have a SQL Server 2008 box to test this. So I tried my best to cross-check every function below is available in 2008:
;WITH
tmp AS
(
SELECT a.QUESTION
, a.VISIT_DATE
, b.VALUE
, ROW_NUMBER() OVER (PARTITION BY a.QUESTION, a.VISIT_DATE ORDER BY b.VALUE)
AS RowNumber
, FLOOR(CONVERT(float, COUNT(b.Value) OVER (PARTITION BY a.QUESTION, a.VISIT_DATE) + 1) / 2)
AS LowerMedianRowNumber
, CEILING(CONVERT(float, COUNT(b.Value) OVER (PARTITION BY a.QUESTION, a.VISIT_DATE) + 1) / 2)
AS UpperMedianRowNumber
FROM (
SELECT DISTINCT
QUESTION
, VISIT_DATE
FROM #TEST
) a
INNER JOIN #TEST b ON a.QUESTION = b.QUESTION
AND a.VISIT_DATE > b.VISIT_DATE
),
GroupMedian AS
(
SELECT QUESTION
, VISIT_DATE
, AVG(Value) AS MedianValue
FROM tmp
WHERE RowNumber IN (LowerMedianRowNumber, UpperMedianRowNumber)
GROUP BY QUESTION
, VISIT_DATE
)
SELECT a.*
, b.MedianValue
FROM #TEST a
LEFT JOIN GroupMedian b ON a.QUESTION = b.QUESTION
AND a.VISIT_DATE = b.VISIT_DATE
ORDER BY QUESTION
, VISIT_DATE