Search code examples
rclassobjectr-s4

How to find the default construstor methods for a class


Problems comes from experimenting a package and find using new(Class = 'ddmatrix', Data = X) and ddmatrix(Data = X) yields different results, in which X is a matrix(one can think class ddmatrix is a transformed Class matrix).

Document

In the package, a S4 class ddmatrix is defined. A generic constructor function by setGeneric(name = 'ddmatrix'). Further, the pacakge defines setMethod('ddmatrix', signature = 'matrix', ...) as below:

 setMethod("ddmatrix", signature(data="matrix"), 
              function(data, nrow=1, ncol=1, byrow=FALSE, ...
                       bldim=.pbd_env$BLDIM, ICTXT=.pbd_env$ICTXT)
    {
    dim(data) <- NULL
    ret <- ddmatrix(data=data, nrow=nrow, ncol=ncol, byrow=byrow, bldim=bldim, ICTXT=ICTXT)    
    return( ret )
}
)

I am confused how a method ddmatrix is used in the above setMethod('ddmatrix', signature = 'matrix') step. Is this ddmatrix method the default method for the generic ddmatrix?

Meanwhile, when call new('ddmatrix', Data = X), which method it will call to build a new ddmatrix object from a matrix object? new function is:

function (Class, ...) 
{
    ClassDef <- getClass(Class, where = topenv(parent.frame()))
    value <- .Call(C_new_object, ClassDef)
    initialize(value, ...)
}

Question

To answer the discrepancy between new('ddmatrix') and ddmatrix(), I think one way is to find the default constructor. Meanwhile, the package also defines setMethod('ddmatrix', signature = 'vector',...), is this the default one?


Solution

  • At some level this is up to the author. Many people view new() and @ or slot() (for slot access) as strictly for the package developer -- these expose the implementation details directly to the user -- and prefer to write constructors and accessors that place an interface on top of the implementation. This appears to be the case for the package that you are considering, where ddmatrix() is meant to be the user-oriented constructor.

    The author appears to have implemented a facade pattern, where several different methods make relatively minor data transformations before calling another function / method to do the actual object construction. From what you show, it seems ddmatrix,matrix-method invokes ddmatrix,vector-method (because inside ddmatrix,matrix-method the function sets dim(data) <- NULL, turning the matrix into a vector, and then calls ddmatrix() which now dispatches to the vector method), and this constructs the object via new() at https://github.com/RBigData/pbdDMAT/blob/master/R/constructors.r#L191. A different package author could have adopted a different design, where several methods separately call new().

    The documentation often also helps, e.g., ?ddmatrix does not discuss direct object construction via new().

    Here's a simpler example. I create a class "A", with a single slot containing a numeric vector

    setClass("A", slots=c(x="numeric"))
    

    Here I create a constructor, because I want the user to see the interface to the class, rather than it's implementation

    A = function(x=numeric())
        new("A", x=x)
    

    So far, A() and new("A") return an object with the same structure, e.g.,

    > new("A")
    An object of class "A"
    Slot "x":
    numeric(0)
    
    > A()
    An object of class "A"
    Slot "x":
    numeric(0)
    

    Maybe as the developer of the "A" class, I want an uninitialized object of class 'A' to have 'NA' as the value of the slot x, so I modify

    A = function(x = NA_real_)
        new("A", x=x)
    

    now a direct call to new() returns a different object from a call to A()

    > new("A")
    An object of class "A"
    Slot "x":
    numeric(0)
    
    > A()
    An object of class "A"
    Slot "x":
    [1] NA
    

    Which one is 'correct'? Well, both are correct, but as the creator of the class I intend for the user to create an object of class "A" by calling the function A().

    A typical reason for separating the interface (using A() to construct an object) from the implementation (using new() to construct an object) is because the implementation is not obvious to the user. This seems to be the case with the ddmatrix() function -- for reasons that only the package author needs to know about, it is convenient to store an R matrix as a vector with information about dimensions. I guess a simple equivalent might be

    setClass("A", slots=c(data="numeric", nrow="integer", ncol="integer"))
    A = function(m=matrix(0, 0, 0)) {
        stopifnot(is(m, "matrix"))
        new("A", data=as.vector(m), nrow=nrow(m), ncol=ncol(m))
    }
    

    for instance

    > A(matrix(1:10, 5))
    An object of class "A"
    Slot "data":
     [1]  1  2  3  4  5  6  7  8  9 10
    
    Slot "nrow":
    [1] 5
    
    Slot "ncol":
    [1] 2
    

    Why does the author want to do this? It doesn't matter to us as users. Why can't we create the same object by calling m = matrix(1:10, 5); new("A", data=as.vector(m), nrow=nrow(m), ncol(m))? We could, but then when the author decided to change their implementation such that the offsets to the start of each row were to be stored, we'd have to understand what the author had done and update our code.