I am trying to produce an oracle sql query which will not only calculate the median age but also the 95% confidence around it. To add a complication need to do it across groups in this case gender I have a table of people with their age and gender. I want to determine the median age for each group together with its 95% confidence intervals. My current attempt which fails is as follows.
select gender,
median(age),
count(*),
percentile_cont(ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*))
within GROUP (ORDER BY age) lowmedianage,
percentile_cont(ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*))
within GROUP (ORDER BY age) highmedianage
from persontable
group by gender
I get an error not a GROUP BY expression.
Using the formula from this book I'm ending with following query (I'm not sure if you handles the low and high ranges well; my interpretation is that you calculates a range of sequence numbers and you must look up the values from those positions).
with tab as
-- add sequence per group
(
select gender, age,
row_number() over (PARTITION BY gender order by gender, age) as seq
from persontable
),
-- get count
N as (select gender, count(*) cnt from persontable group by gender),
-- calculate sequence numbers of the CI
ci_seq as (
select gender,
round(cnt/2 - (1.96 * sqrt(cnt)/2)) r,
round(1 + cnt/2 + (1.96 * sqrt(cnt)/2)) s
from n),
-- calculate median
med as (
select
gender,
median(age) median_age
from persontable
group by gender),
med2 as (
select med.gender, median_age, r, s
from med
join ci_seq on med.gender = ci_seq.gender
)
select gender, median_age,
(select age from tab where seq = r and gender = med2.gender) ci_from,
(select age from tab where seq = s and gender = med2.gender) ci_to
from med2
;
Note also, that the formula only approximates the CI. You may also check this thread for alternative calculations.