Search code examples
oracle-databasemedian

calculating confidence intervals for a median in oracle


I am trying to produce an oracle sql query which will not only calculate the median age but also the 95% confidence around it. To add a complication need to do it across groups in this case gender I have a table of people with their age and gender. I want to determine the median age for each group together with its 95% confidence intervals. My current attempt which fails is as follows.

select gender,
       median(age),
       count(*),
       percentile_cont(ROUND((COUNT(*)/2)-1.96*sqrt(COUNT(*))/2)/COUNT(*)) 
         within GROUP (ORDER BY age) lowmedianage,
       percentile_cont(ROUND((COUNT(*)/2)+1.96*sqrt(COUNT(*))/2)/COUNT(*)) 
         within GROUP (ORDER BY age) highmedianage
  from persontable
  group by gender

I get an error not a GROUP BY expression.


Solution

  • Using the formula from this book I'm ending with following query (I'm not sure if you handles the low and high ranges well; my interpretation is that you calculates a range of sequence numbers and you must look up the values from those positions).

    with tab as 
    -- add sequence per group
    (
    select gender, age,
    row_number() over (PARTITION  BY gender order by  gender, age) as seq
    from persontable
    ),
    -- get count
    N as (select gender, count(*) cnt from persontable group by gender),
    -- calculate sequence numbers of the CI
    ci_seq as (
    select gender,
    round(cnt/2 - (1.96 * sqrt(cnt)/2)) r,
    round(1 + cnt/2 + (1.96 * sqrt(cnt)/2)) s
    from n),
    -- calculate median
    med as (
    select 
      gender,
      median(age) median_age
    from  persontable
    group by gender),
    med2 as (
    select med.gender, median_age, r, s
    from med 
    join ci_seq on med.gender = ci_seq.gender 
    )
    select gender, median_age,
    (select age from tab where seq = r and gender = med2.gender) ci_from,
    (select age from tab where seq = s and gender = med2.gender) ci_to
    from med2
    ;
    

    Note also, that the formula only approximates the CI. You may also check this thread for alternative calculations.