Search code examples
rregressionlogistic-regression

Cluster standard errors for ordered logit R polr - values deleted in estimation


I am quite new to R and used to pretty basic application. Now I have encountered a problem I need help with:

I am looking for a way to cluster standard errors for an ordered logistic regression (my estimation is similar to this example)

I already tried robcov and vcovCL and they give me similar error messages:

  • Error in meatCL(x, cluster = cluster, type = type, ...) : number of observations in 'cluster' and 'estfun()' do not match
  • Error in u[, ii] <- ui : number of items to replace is not a multiple of replacement length

Many thanks in advance!

Edit: I found some more information about the missing values but that does not seem to be the problem - because it persists even if I work around it using this answer, or when use a dataset without NA's. Just as in the example below. The problem seems to be that polr does not give me the residuals as part of the output. How can I work around this?

 dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")
    length(dat$apply)
    twenty <- seq(from=1, to=20, by=1)
    dat$clustervar<-sample(twenty, size=400, replace=TRUE)



    m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
    vcovCL <- function(x, cluster.by, type="sss", dfcw=1){
      # R-codes (www.r-project.org) for computing
      # clustered-standard errors. Mahmood Arai, Jan 26, 2008.

      # The arguments of the function are:
      # fitted model, cluster1 and cluster2
      # You need to install libraries `sandwich' and `lmtest'

      # reweighting the var-cov matrix for the within model
      require(sandwich)
      cluster <- cluster.by
      M <- length(unique(cluster))
      N <- length(cluster)
      stopifnot(N == length(x$residuals))
      K <- x$rank
      ##only Stata small-sample correction supported right now
      ##see plm >= 1.5-4
      stopifnot(type=="sss")
      if(type=="sss"){
        dfc <- (M/(M-1))*((N-1)/(N-K))
      }
      uj  <- apply(estfun(x), 2, function(y) tapply(y, cluster, sum))
      mycov <- dfc * sandwich(x, meat=crossprod(uj)/N) * dfcw
      return(mycov)
    }
    vcovCL(dat, m, dat$clustervar)

This gives me:

Error: N == length(x$residuals) is not TRUE
Called from: vcovCL(dat, m, dat$clustervar)

Solution

  • I had success following the help page for ?sandwich::vcovCL which shows that the first arg to the function is a model object. Needed to use the :: operator to mask the function you offered:

     m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
     ( clval <- sandwich::vcovCL(m, dat$clustervar) )
                                      pared       public         gpa unlikely|somewhat likely
    pared                       0.085218306  0.005588259  0.04584255               0.15545404
    public                      0.005588259  0.092283173 -0.01890725              -0.05875859
    gpa                         0.045842552 -0.018907254  0.07067573               0.22455931
    unlikely|somewhat likely    0.155454041 -0.058758588  0.22455931               0.72408670
    somewhat likely|very likely 0.165079639 -0.058282514  0.23631756               0.75713049
                                somewhat likely|very likely
    pared                                        0.16507964
    public                                      -0.05828251
    gpa                                          0.23631756
    unlikely|somewhat likely                     0.75713049
    somewhat likely|very likely                  0.80749182
    

    You may need to use the diag of that matrix if you want Wald tests. I think that is what coeftest will deliver:

    coeftest( m, vcov = clval)
    
    t test of coefficients:
    
            Estimate Std. Error t value  Pr(>|t|)    
    pared   1.047690   0.291922  3.5889 0.0003738 ***
    public -0.058786   0.303781 -0.1935 0.8466565    
    gpa     0.615941   0.265849  2.3169 0.0210210 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    The other question that prompted the successful search of Rhelp and finding the answer by Achim Zeileis is here