Coxph "X matrix deemed to be singular" without perfect classification?

I am trying to carry out a self-controlled case series using the package sccs from here. This is a statistical method which takes "baseline" and "exposed" periods in the timecourse of e.g. a year of a patient's life. The exposed periods could represent exposure to a drug and the outcome measured could be a side-effect of the drug as, indeed, it is in my case.

This package essentially formats the data into intervals of baseline and exposure risk. A patient identifier indivL (factor), interval (integer, number of days), exposure status (0/1), event status (0/1). It then feeds this data into survival::clogit as a model of the form:

event ~ exposure + strata(indivL) + offset(log(interval))

The data fed to clogit is a data frame of the form:

   indivL event eventday  lower  upper interval age   exposure  indiv aevent astart   aend drugtype
 * <fct>  <dbl>    <int>  <dbl>  <dbl>    <dbl> <fct> <fct>    <dbl>  <dbl>  <dbl>  <dbl>     <dbl>
 1 1         0.    22361 22219. 22252.      34. 1     0           1. 22361. 22219. 22460.        0.
 2 1         0.    22361 22253. 22260.       8. 1     1           1. 22361. 22219. 22460.        0.
 3 1         1.    22361 22261. 22460.     200. 1     0           1. 22361. 22219. 22460.        0.
 4 2         0.    22401 22219. 22252.      34. 1     0           1. 22401. 22219. 22460.        0.
 5 2         0.    22401 22253. 22260.       8. 1     1           1. 22401. 22219. 22460.        0.
 6 2         1.    22401 22261. 22460.     200. 1     0           1. 22401. 22219. 22460.        0.
 7 3         0.    31071 30834. 30863.      30. 1     0           2. 31071. 30834. 31075.        0.
 8 3         0.    31071 30864. 30871.       8. 1     1           2. 31071. 30834. 31075.        0.
 9 3         1.    31071 30872. 31075.     204. 1     0           2. 31071. 30834. 31075.        0.
10 4         1.      261   207.   356.     150. 1     0           3.   261.   207.   425.        0.
# ... with 1,211,460 more rows

I have my model working well for giving me a result when using the above. However, I want to add in other independent variables. These are binary categoricals and I have tried them as both 0/1 integers and 2-level factors. One example would be drugtype. In this case the model takes the form:

event ~ exposure + drugtype + strata(indivL) + offset(log(interval))

My error is:

Warning message:
In coxph(formula = Surv(rep(1, 176241L), event) ~ exposure + drugtype +  :
  X matrix deemed to be singular; variable 2

My model is:

--SNIP--
coxph(formula = Surv(rep(1, 176241L), event) ~ exposure + drugtype + 
     strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")

  n= 176049, number of events= 58602 
   (192 observations deleted due to missingness)

             coef exp(coef) se(coef)     z            Pr(>|z|)    
exposure1 0.70760   2.02912  0.01662 42.57 <0.0000000000000002 ***
drugtype       NA        NA  0.00000    NA                  NA    
--SNIP--

As you can see, it does not like drugtype, which is a binary variable.

Having looked around I've come across several sources which suggest the problem is a case of "perfect classification" i.e. one of my variables perfectly predicts the presence of another. However, using xtabs() I get:

> xtabs(~drugtype + event, data = chopdat)

         event
 drugtype      0      1
        0 778306 388279
        1  29344  14625

and

> xtabs(~ exposure + event, data = chopdat)

        event
exposure      0      1
       0 427482 380101
       1 380788  23113

and

> xtabs(~ drugtype + exposure, data = chopdat)

       drugtype
exposure      0      1
       0 777655  29308
       1 388930  14661

Suggesting there is a good distribution and no perfect classification.

Can anyone point me in the right direction for some more information on this? I feel I've reached the limits of what I'm able to do with the documentation and searching for other answers to this question on StackOverflow.

Many thanks.

Solution

Right, hello all.

Many thanks to @Mike for trying to help me. I have found the explanation for this behaviour and it is a quirk of the SCCS method of modelling rather than to do with clogit itself.

From Farrington, Whitaker and Weldeselassie Self-Controlled Case Series Studies, A Modelling Guide with R:

Note that the main effect of the covariate is not included in the model formula, since it cannot be estimated in a SCCS model as it drops out of the likelihood.

So, the main effect cannot be estimate and is thus returned as NA.

Apologies for posting a question to which I should have been able to find an answer but hopefully this will be of use to anyone else who runs across this "problem".