Search code examples
rr-s3

How do user defined S3 Group Generic Functions work in R?


I'm reading Advanced R by Hadley Wickham and I'm confused at section 13.7.3 Group Generics.

enter image description here

I was a bit confused by the phrasing, "...you cannot define your own group generic... defining a single group generic for your class..." but I think this section means to say that if I define the group generic Math.MyClass then all functions within the Math group generic (abs, sign, etc) will be overwritten for MyClass objects.

This can be confirmed by running the following:

my_class <- structure(.Data = -1, class = "MyClass")

my_class
# [1] -1
# attr(,"class")
# [1] "MyClass"

abs(my_class)
# [1] 1
# attr(,"class")
# [1] "MyClass"

Math.MyClass <- function(x) { x }

abs(my_class)
# [1] -1
# attr(,"class")
# [1] "MyClass"

I understand that this follows the special naming scheme generic.class but why is the value of .Data affected in abs(my_class)?

When I created the variable my_class, I set the argument .Data = -1, and the class of -1 is numeric and that should not have changed:

class(unclass(my_class))
# [1] "numeric"

my_numeric <- unclass(my_class)

class(my_numeric)
# [1] "numeric"

abs(my_numeric)
# [1] 1

So why doesn't abs(my_class) print the same result (1) before and after I define Math.MyClass?

I do receive the same results before and after I define Math.MyClass if I define the group generic as Math.MyClass <- function(x) {NextMethod()} but what's the point of having group generics then?

And, why do I get the same answer for abs(my_matrix) both before and after I define Math.matrix when I run the following:

my_matrix <- matrix(data = -1:-10, ncol = 5) + 0.0

class(my_matrix)
# [1] "matrix"

class(my_matrix[1,1])
# [1] "numeric"

my_matrix
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   -1   -3   -5   -7   -9
# [2,]   -2   -4   -6   -8  -10

abs(my_matrix)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    3    5    7    9
# [2,]    2    4    6    8   10

Math.matrix <- function(x) { x }

abs(my_matrix)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    3    5    7    9
# [2,]    2    4    6    8   10

And when I run the following:

your_class <- structure(.Data = list(-1), class = "YourClass")

your_class
# [[1]]
# [1] -1
# 
# attr(,"class")
# [1] "YourClass"

abs(your_class)
# Error in abs(your_class) : non-numeric argument to mathematical function

class(unclass(your_class))
# [1] "list"

your_list <- list(-1)

class(your_list)
# [1] "list"

abs(your_list)
# Error in abs(your_list) : non-numeric argument to mathematical function

It's clear that the class of .Data does matter (initially anyway) because both abs(your_class) and abs(your_list) result in the same error.

To make things even more challenging, I found that everything goes back to normal for MyClass objects once I run rm(Math.MyClass):

my_class
# [1] -1
# attr(,"class")
# [1] "MyClass"

abs(my_class)
# [1] -1
# attr(,"class")
# [1] "MyClass"

rm(Math.MyClass)

abs(my_class)
# [1] 1
# attr(,"class")
# [1] "MyClass"

Could someone explain more completely what group generics are (Why do group generics exist / what do they accomplish / what is their parent-child relationship with R objects / why are the data arguments in some objects affected when the group generic is defined and others are not / etc)?

I have more experience with OOP in Python than in R if you feel it's easier to explain with Python examples. Any help is greatly appreciated!


Solution

  • Group generics allow you to change behavior for a group of functions for a particular data type. The best way to explain that is to look at some examples. If you run methods("Math") you'll see which classes have this function defined.

    In the case of Math.Date you'll see

    function (x, ...) 
    stop(gettextf("%s not defined for \"Date\" objects", .Generic), 
        domain = NA)
    

    So all that does is tell you that all these functions are not defined for Date objects. For example

    abs(as.Date("2020-01-01"))
    # Error in Math.Date(as.Date("2020-01-01")) : 
    #   abs not defined for "Date" objects
    

    By setting this behavior at the group level, it's not necessary to code special versions of all the functions in the Math group to tell you they are not defined for that class because they aren't numeric "in that way". But even though trunc() in that list and you might expect to get an error, that actually works

    trunc(as.Date("2020-01-01"))
    [1] "2020-01-01"
    

    And that's because trunc.Date is defined.

    function (x, ...) 
    round(x - 0.4999999)
    

    So the special part of group generics is that you can define a default "fallback" behavior for common math-y functions. But if you want to change that behavior, you can still provide a class specific implementation for a particular function.

    Note that the same Math.MyClass is called for all those functions listed in the Math group. There is a variable available to the function to know which function was actually called. That variable is called .Generic and is discussed on the ?UseMethod help page. For example

    Math.MyClass<- function(x) { paste(.Generic, x) }
    abs(my_class)
    # [1] "abs -1"
    trunc(my_class)
    # [1] "trunc -1"
    exp(my_class)
    # [1] "exp -1"
    

    which hopefully makes it clear you don't want to directly place any transformations in there without dispatching on .Generic. For an example of a function that does do dispatching on some of the function types, check out Math.difftime

    function (x, ...) 
    {
        switch(.Generic, abs = , sign = , floor = , ceiling = , trunc = , 
            round = , signif = {
                units <- attr(x, "units")
                .difftime(NextMethod(), units)
            }, stop(gettextf("'%s' not defined for \"difftime\" objects", 
                .Generic), domain = NA))
    }
    

    You can see for a specific subset of functions, it will dispatch for the "normal" implementations, otherwise it throws an error.

    So when you defined

    Math.MyClass <- function(x) { x }
    

    Basically you told R that you would handle the call to abs() for objects of that class. And when you returned x unchanged in your implementation, you basically just returned the same object and did nothing. When you don't define Math.MyClass, then R goes through the "normal" steps to determine how to call the function. Since you didn't provide a custom "identity" function, it falls back to the default numeric behavior.

    As for why the matrix behavior didn't change, that's because the class of my_matrix is determined implicitly. If you do dump the object, you'll see

    dput(my_matrix)
    # structure(
    #     c(-1, -2, -3, -4, -5, -6, -7, -8, -9, -10), 
    #     .Dim = c(2L, 5L))
    

    Note that it's not storing the class in the object itself like your my_class object

    dput(my_class)
    # structure(-1, class = "MyClass")
    

    With the implicit classes, dispatching happens a bit differently and it's dispatch more like the extra class isn't there. The class based dispatch checks for that class attribute on the object. Note that this will behave differently

    my_matrix2 <- structure(my_matrix, class="matrix")
    # abs(my_matrix2)
    #      [,1] [,2] [,3] [,4] [,5]
    # [1,]   -1   -3   -5   -7   -9
    # [2,]   -2   -4   -6   -8  -10
    # attr(,"class")
    # [1] "matrix"
    

    You can see that in this case Math.matrix is called and nothing is changed.