Search code examples
rregressionlogistic-regressionsplinesurvival-analysis

Error retrieving the summary from clogit model


I am training a Conditional logistic regression model with clogit on multiple datasets (with thousands of events) in the following way:

library(survival) 
library(mgcv)

# load dataset 
df <- read.csv('1.csv')

model <- clogit(case ~ 
           var1 +
           # pspline(var2, df = 3) +
           strata(var3),
           data = df)

print(model)
summary(model)

The column types are: case: int, var1: factor, var2: int, var3: int.

If I keep this line commented: pspline(var2, df = 3) +, the summary printing works fine when the dataset has enough cases in each strata, otherwise I get the following warning and very large standard errors:

Warning message in fitter(X, Y, strats, offset, init, control, weights = weights, :
“Loglik converged before variable  1,2,3 ; beta may be infinite. ”

However, if I use the line pspline(var2, df = 3) +, then I do not get such warnings even when the dataset does not have enough cases in each strata. The print(model) line works, but I get the following error when I try to access the summary of the model:

Error in pchisq(cox$score, df, lower.tail = FALSE): Non-numeric argument to mathematical function
Traceback:

1. print(summary(model))
2. summary(model)
3. summary.coxph(model)
4. pchisq(cox$score, df, lower.tail = FALSE)

I need to access the summary because I am printing the coefficients to a csv file for later processing: summary(model)$coefficients since I am training the models on multiple files.

I could not find a reason for this behavior, and any help would be appreciated.


Edit: 06.26 Minimum reproducible example:

num_cases = 100
var3 = rep((1:num_cases), each=3)
case = rep(c(0, 1, 1), num_cases)
var1 = factor(sample(c("Low", "Medium", "High"), num_cases, replace=TRUE, prob = c(0.5,0.35,0.25)))
var2 = runif(num_cases * 3, 10, 35)

generated_data <- data.frame(var3, case, var1, var2)

model <- clogit(case ~ 
           var1 +
           pspline(var2, df = 3) +
           strata(var3),
           data = generated_data)

print(model)
summary(model)$coefficients

Result:

  • Adding a comma after case ~ var1 does not produce the error. The code now prints coefficients, but the coefficients it returns are different to the ones returned when I remove the comma and use print(model).

  • The above code fails to converge when num_cases = 200.

  • Adding the comma after case ~ var1 produces another warning:

    Warning message in clogit(case ~ var1, +pspline(var2, df = 3) + strata(var3), data = generated_data): “weights ignored: not possible for the exact method”


Solution

  • What's happening is that an object of class 'clogit' is being passed by inheritance to summary.cph which is apparently not designed for it. You can get the coefficients from the print.clogit function, which is what is being implicitly called when you ask for the models results:

     model
    Call:
    clogit(case ~ var1 + pspline(var2, df = 3) + strata(var3), data = generated_data)
    
                   coef exp(coef) se(coef)      z     p
    var1Low    -0.02063   0.97958  0.23019 -0.090 0.929
    var1Medium -0.02171   0.97852  0.23831 -0.091 0.927
    ps(var2)3  -0.05886   0.94284  0.40900 -0.144 0.886
    ps(var2)4  -0.10752   0.89806  0.60868 -0.177 0.860
    ps(var2)5  -0.16100   0.85129  0.65381 -0.246 0.805
    ps(var2)6  -0.23156   0.79330  0.63652 -0.364 0.716
    ps(var2)7  -0.26708   0.76561  0.61080 -0.437 0.662
    ps(var2)8  -0.23270   0.79239  0.59903 -0.388 0.698
    ps(var2)9  -0.23075   0.79394  0.59781 -0.386 0.700
    ps(var2)10 -0.27852   0.75690  0.60117 -0.463 0.643
    ps(var2)11 -0.26878   0.76431  0.64828 -0.415 0.678
    ps(var2)12 -0.24330   0.78404  0.84486 -0.288 0.773
    
    Likelihood ratio test=0.92  on 5.04 df, p=0.9702
    n= 300, number of events= 200 
    

    As a bonus you get the LLR test value and a p-value. If you just want the matrix of the sort typically returned by a summary function, then make the obvious mods to the section of code in print.coxph:

    { coef <- model$coefficients
    se <- sqrt(diag(model$var))
    if (is.null(coef) | is.null(se)) 
        stop("Input is not valid")
    if (is.null(model$naive.var)) {
        tmp <- cbind(coef, exp(coef), se, coef/se, pchisq((coef/se)^2, 
                                                          1, lower.tail = FALSE))
        dimnames(tmp) <- list(names(coef), c("coef", "exp(coef)", 
                                             "se(coef)", "z", "p"))} }
    

    And then tmp will be the desired matrix:

     tmp
                      coef exp(coef)  se(coef)           z         p
    var1Low    -0.02062820 0.9795831 0.2301878 -0.08961464 0.9285935
    var1Medium -0.02171186 0.9785221 0.2383051 -0.09110952 0.9274056
    ps(var2)3  -0.05886243 0.9428365 0.4089999 -0.14391797 0.8855652
    ps(var2)4  -0.10752217 0.8980566 0.6086828 -0.17664728 0.8597855
    ps(var2)5  -0.16099704 0.8512946 0.6538079 -0.24624519 0.8054924
    ps(var2)6  -0.23155828 0.7932965 0.6365191 -0.36378845 0.7160160
    ps(var2)7  -0.26708193 0.7656103 0.6108000 -0.43726573 0.6619186
    ps(var2)8  -0.23269795 0.7923929 0.5990301 -0.38845785 0.6976772
    ps(var2)9  -0.23074825 0.7939393 0.5978122 -0.38598783 0.6995057
    ps(var2)10 -0.27852015 0.7569030 0.6011671 -0.46329903 0.6431500
    ps(var2)11 -0.26877900 0.7643121 0.6482824 -0.41460174 0.6784335
    ps(var2)12 -0.24329853 0.7840374 0.8448572 -0.28797593 0.7733652
    

    I'm not sure a bug report is warranted but if you think otherwise, then Thomas Lumley is the clogit author. The help page for clogit does not describe a summary method and it appears that the print.clogit, with dispatching to print.coxph method was being used for the purposes typically assigned to summary.

    In addition, the coefficients themselves could have been obtained with model$coef but that would not have returned the full matrix of coefficients pus ancillary statistical estimates,