Search code examples
rpackageordinal

linear predictor - ordered probit (ordinal, clm)


I have got a question regarding the ordinal package in R or specifically regarding the predict.clm() function. I would like to calculate the linear predictor of an ordered probit estimation. With the polr function of the MASS package the linear predictor can be accessed by object$lp. It gives me on value for each line and is in line with what I understand what the linear predictor is namely X_i'beta. If I however use the predict.clm(object, newdata,"linear.predictor") on an ordered probit estimation with clm() I get a list with the elements eta1 and eta2,

  1. with one column each, if the newdata contains the dependent variable
  2. where each element contains as many columns as levels in the dependent variable, if the newdata doesn't contain the dependent variable

Unfortunately I don't have a clue what that means. Also in the documentations and papers of the author I don't find any information about it. Would one of you be so nice to enlighten me? This would be great.

Cheers,

AK


Solution

  • UPDATE (after comment):

    Basic clm model is defined like this (see clm tutorial for details):

    Generating data:

    library(ordinal)
    set.seed(1)
    test.data = data.frame(y=gl(4,5),
                           x=matrix(c(sample(1:4,20,T)+rnorm(20), rnorm(20)), ncol=2))
    head(test.data) # two independent variables 
    test.data$y     # four levels in y
    

    Constructing models:

    fm.polr <- polr(y ~ x) # using polr
    fm.clm  <- clm(y ~ x)  # using clm
    

    Now we can access thetas and betas (see formula above):

    # Thetas
    fm.polr$zeta # using polr
    fm.clm$alpha # using clm
    # Betas
    fm.polr$coefficients # using polr
    fm.clm$beta          # using clm
    

    Obtaining linear predictors (only parts without theta on the right side of the formula):

    fm.polr$lp                                                 # using polr
    apply(test.data[,2:3], 1, function(x) sum(fm.clm$beta*x))  # using clm
    

    New data generation:

    # Contains only independent variables
    new.data <- data.frame(x=matrix(c(rnorm(10)+sample(1:4,10,T), rnorm(10)), ncol=2))
    new.data[1,] <- c(0,0)  # intentionally for demonstration purpose
    new.data
    

    There are four types of predictions available for clm model. We are interested in type=linear.prediction, which returns a list with two matrices: eta1 and eta2. They contain linear predictors for each observation in new.data:

    lp.clm <- predict(fm.clm, new.data, type="linear.predictor")
    lp.clm
    

    Note 1: eta1 and eta2 are literally equal. Second is just a rotation of eta1 by 1 in j index. Thus, they leave left side and right side of linear predictor scale opened respectively.

    all.equal(lp.clm$eta1[,1:3], lp.clm$eta2[,2:4], check.attributes=FALSE)
    # [1] TRUE
    

    Note 2: Prediction for first line in new.data is equal to thetas (as far as we set this line to zeros).

    all.equal(lp.clm$eta1[1,1:3], fm.clm$alpha, check.attributes=FALSE)
    # [1] TRUE
    

    Note 3: We can manually construct such predictions. For instance, prediction for second line in new.data:

    second.line <- fm.clm$alpha - sum(fm.clm$beta*new.data[2,])
    all.equal(lp.clm$eta1[2,1:3], second.line, check.attributes=FALSE)
    # [1] TRUE
    

    Note 4: If new.data contains response variable, then predict returns only linear predictor for specified level of y. Again we can check it manually:

    new.data$y <- gl(4,3,length=10)
    lp.clm.y <- predict(fm.clm, new.data, type="linear.predictor")
    lp.clm.y
    
    lp.manual <- sapply(1:10, function(i) lp.clm$eta1[i,new.data$y[i]])
    all.equal(lp.clm.y$eta1, lp.manual)
    # [1] TRUE