Search code examples
rstatastatistics-bootstrap

Bootstrapping Proportions of Categorical Variables in R or Stata


I need help in doing bootstrap in either R or Stata software. I want to calculate the proportion of those who say Yes and No to e.g. effectiveness of a policy

In Stata I have this code

bs "summarize y1" "r(mean)", reps(200) size(770)

what should be the value for r(mean) to estimate the proportions?

Also, I have this code in R:

test <- function (q13){
    test13 <- table(q13)
    rel_freq <- test13/sum(test13)
    return(rel_freq)
      }

results <- boot(data=q13, statistic=test,
                R=200)

How do I correct the code? I'm getting the error

Error in statistic(data, original, ...) : unused argument(s) (original)


Solution

  • In Stata you can use proportion if a variable has more than two categories:

    //sample data

    sysuse auto, clear
    keep if (headroom==2.0 |headroom==2.5)
    gen prop=.
    replace prop=0 if headroom==2.0
    replace prop=1 if headroom==2.5
    

    //say 0 is yes and 1 is no

    set seed 123
    bootstrap _b, reps(100):proportion prop
    

    Updated as per @Nick: For binary variable, the following is sufficient

    bootstrap r(mean), reps(100): summarize prop, meanonly
    

    ..........................................................................................................................................................................

    In R, you can do as follows using boot package and mtcars data:

    library(boot)
    set.seed(123)
    x<-mtcars$vs
    myprop<-function(x,i){
    sum(x[i]==0)/length(x)
    }
    
    bootprop <- boot(x,myprop,100)