Search code examples
rr-s4

What does exportClasses actually do with S4 classes?


I was trying to understand how namespacing works with S4 classes so put a small example together using different export directives. The results make no sense to me so was hoping someone might be able to help explain what's going on ?

My package consists of just 1 file with the following code:

B1 <- setClass( "Base_1", slots = list(x = "numeric"))
B2 <- setClass( "Base_2", slots = list(x = "numeric"))
B3 <- setClass( "Base_3", slots = list(x = "numeric"))
B4 <- setClass( "Base_4", slots = list(x = "numeric"))

I then define the following in the NAMESPACE file:

export(B1)
exportClasses(Base_1)
export(B2)
exportClasses(Base_3)

I then load and try to use the package as follows:

library(tstpkg)

B1(x = 1)   # Runs fine
B2(x = 1)   # Runs fine
B3(x = 1)   # Errors (as expected)
B4(x = 1)   # Errors (as expected)

new("Base_1", x = 1)   # Runs fine
new("Base_2", x = 1)   # Runs fine - Was expecting to error
new("Base_3", x = 1)   # Runs fine
new("Base_4", x = 1)   # Runs fine - Was expecting to error

setClass("Derv_1", contains = "Base_1")   # Runs fine
setClass("Derv_2", contains = "Base_2")   # Runs fine - Was expecting to error
setClass("Derv_3", contains = "Base_3")   # Runs fine
setClass("Derv_4", contains = "Base_4")   # Runs fine - Was expecting to error

I was expecting new(<class>) and setClass(..., contains=<class>) to fail for Base_2 and Base_4 as neither of these classes were exposed. Any one able to explain what's happening here ?

(I put all code into a github repo here if you want to play with the code yourselves)


Solution

  • Classes that you export are placed in a list of exports, stored in your package namespace. Here is a list of classes exported by package Matrix, version 1.6-0:

    ns <- asNamespace("Matrix")
    ns.exports <- getNamespaceInfo(ns, "exports")
    cl1 <- sort(grep("^[.]__C__", names(ns.exports), value = TRUE))
    cl1[1:20]
    
     [1] ".__C__BunchKaufman"              ".__C__BunchKaufmanFactorization"
     [3] ".__C__CHMfactor"                 ".__C__CHMsimpl"                 
     [5] ".__C__CHMsuper"                  ".__C__Cholesky"                 
     [7] ".__C__CholeskyFactorization"     ".__C__CsparseMatrix"            
     [9] ".__C__LU"                        ".__C__Matrix"                   
    [11] ".__C__MatrixFactorization"       ".__C__QR"                       
    [13] ".__C__RsparseMatrix"             ".__C__Schur"                    
    [15] ".__C__SchurFactorization"        ".__C__TsparseMatrix"            
    [17] ".__C__abIndex"                   ".__C__atomicVector"             
    [19] ".__C__compMatrix"                ".__C__corMatrix"
    

    Here is a list of classes defined in the Matrix namespace but not exported:

    cl0 <- setdiff(sort(grep("^[.]__C__", names(ns), value = TRUE)), cl1)
    cl0
    
     [1] ".__C__dCsparseMatrix" ".__C__determinant"    ".__C__geMatrix"      
     [4] ".__C__lCsparseMatrix" ".__C__mMatrix"        ".__C__nCsparseMatrix"
     [7] ".__C__numLike"        ".__C__replValueSp"    ".__C__seqMat"        
    [10] ".__C__xMatrix"
    

    Packages export classes in order to define what other packages can and cannot import. If another package tries to import a class not exported from your package, then installation of that package will fail inside of a call to importIntoEnv, with an error of the form:

    class %s is not exported by 'namespace:%s'

    Packages that import classes from other packages cache the definitions of the imported classes in the parent environment of their namespace. They can also cache superclass definitions, but in the namespace itself, not its parent. The latter cache is unknown to most S4-using package maintainers and is liable to become stale; I explain below.

    Here is a list of classes exported by Matrix, version 1.6-0, and imported by SeuratObject, version 4.1.3:

    ns <- asNamespace("SeuratObject")
    ns.imports <- getNamespaceInfo(ns, "imports")
    ns.imports.Matrix <- c(ns.imports[names(ns.imports) == "Matrix"],
                           recursive = TRUE, use.names = FALSE)
    cl1 <- sort(grep("^[.]__C__", ns.imports.Matrix, value = TRUE))
    
    [1] ".__C__dgCMatrix"
    

    Just one, conveniently. Here is the cached class definition:

    cl1.def <- mget(cl1, parent.env(ns))
    str(cl1.def, max.level = 3L)
    
    List of 1
     $ .__C__dgCMatrix:Formal class 'classRepresentation' [package "methods"] with 11 slots
      .. ..@ slots     :List of 6
      .. ..@ contains  :List of 11
      .. ..@ virtual   : logi FALSE
      .. ..@ prototype :Formal class 'S4' [package ""] with 0 slots
     list()
      .. ..@ validity  :function (object)  
      .. ..@ access    : list()
      .. ..@ className : chr "dgCMatrix"
      .. .. ..- attr(*, "package")= chr "Matrix"
      .. ..@ package   : chr "Matrix"
      .. ..@ subclasses: list()
      .. ..@ versionKey:<externalptr> 
      .. ..@ sealed    : logi FALSE
    

    And here are the cached superclass definitions:

    scl1 <- paste0(".__C__", sort(names(cl1.def[[1L]]@contains)))
    scl1.def <- mget(scl1, ns, ifnotfound = list(NULL)) # NULL <=> unexported
    str(scl1.def, max.level = 2L)
    
    List of 11
     $ .__C__CsparseMatrix :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__Matrix        :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__compMatrix    :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__dCsparseMatrix: NULL
     $ .__C__dMatrix       :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__dsparseMatrix :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__generalMatrix :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__mMatrix       : NULL
     $ .__C__replValueSp   : NULL
     $ .__C__sparseMatrix  :Formal class 'classRepresentation' [package "methods"] with 11 slots
     $ .__C__xMatrix       : NULL
    

    The caching of superclass definitions here points to a bug in SeuratObject: it imports dgCMatrix but none of its exported superclasses, i.e., it's NAMESPACE has

    importClassesFrom(Matrix, dgCMatrix)
    

    but should really have

    importClassesFrom(Matrix, dgCMatrix,
                      ## and the exported superclasses:
                      CsparseMatrix, Matrix, compMatrix, dMatrix, 
                      dsparseMatrix, generalMatrix, sparseMatrix)
    

    A consequence is that the superclass definitions cached in the namespace can eventually become stale. Whereas the definition of dgCMatrix is retrieved at load time (when the parent environment of the namespace is populated), the definitions of the superclasses are retrieved at install time (when the namespace itself is populated [and serialized]). If the Matrix version available at load time differs from the one available at install time, and if those versions contain conflicting superclass definitions, then users of SeuratObject can run into problems - ones that are quite hard to debug if you are not already familiar with the caching mechanism.

    It probably should be documented in the Writing R Extensions manual (here) that if you use some class, then you also implicitly use its superclasses and must import those, too.

    These pitfalls are just one of many reasons why maintainers should try to preserve backwards compatibility when changing the definitions of classes that they export.

    Finally, why do new and setClass "see" unexported classes? In the case of new, I imagine that part of the reason is performance. Instantiation must be fast, and the test for "exportedness" has a nontrivial cost. In the case of setClass, I'm not so sure. That definitely seems like a bug, but I'd want to think a bit harder about it, and maybe even consult the R-devel mailing list (also about amending WRE) ... which I've now done.