Search code examples
stata

How can I collapse my dataset to medians and 95% confidence intervals of the median in Stata?


I wish to collapse my dataset and (A) obtain medians by group, and (B) obtain the 95% confidence intervals for those medians.

I can achieve (A) by using collapse (p50) median = cost, by(group).

I can obtain the confidence intervals for the groups using bysort group: centile cost, c(50) but I ideally want to do this in a manner similar to collapse where I can create a collapsed dataset of means, lower limits (ll) and upper limits (ul) for each group (so I can export the dataset for graphing in Excel).

Data example:

input id group cost
1 0 20
2 0 40
3 0 50
4 0 40
5 0 30
6 1 20
7 1 10
8 1 10
9 1 60
10 1 30
end

Desired dataset (or something similar):

. list

     +-----------------------+
     | group   p50   ll   ul |
     |-----------------------|
  1. |     0    40   20   50 |
  2. |     1    20   10   60 |
     +-----------------------+

Solution

  • clear 
    input id group cost
    1 0 20
    2 0 40
    3 0 50
    4 0 40
    5 0 30
    6 1 20
    7 1 10
    8 1 10
    9 1 60
    10 1 30
    end
    
    statsby median=r(c_1) ub=r(ub_1) lb=r(lb_1),  by(group) clear: centile cost 
    
    list 
    
         +--------------------------+
         | group   median   ub   lb |
         |--------------------------|
      1. |     0       40   50   20 |
      2. |     1       20   60   10 |
         +--------------------------+
    

    In addition to the usual help and manual entry, this paper includes a riff on essentially this problem of accumulating estimates and confidence intervals.