Search code examples
r

Why do R external pointers' "unusual copying semantics" mean they should not be used stand-alone?


Section 5.13, External pointers and weak references, of Writing R Extensions states:

External pointer SEXPs are intended to handle references to C structures such as `handles', and are used for this purpose in package RODBC for example. They are unusual in their copying semantics in that when an R object is copied, the external pointer object is not duplicated. (For this reason external pointers should only be used as part of an object with normal semantics, for example an attribute or an element of a list.)

What is meant here by "external pointer object", the external pointer itself or the memory that the external pointer points to? Why would the unusual copying semantics mean that external pointers should only be used as part of an object with normal semantics?

To clarify, my R package is a wrapper around a C library, Baz. The Baz library provides a C structure, Foo, which is used by Baz as a sort of internal working space. Baz provides C functions Foo* baz_allocate_foo() and void baz_free_foo(Foo*) to allocate and free Foo structures, which it does outside of R memory management.

In my R package, I want to use external pointers to store the addresses of these allocated Foo structures. Part of my R package's C++ code (using Rcpp for interfacing) looks like this:

// baz.h is the Baz C library header; contains the definition of struct Foo
#include <R.h>
#include <Rinternals.h>
extern "C" {
#include <baz.h>
}

// For use as the external pointer's tag
#define FOO_CODE 0xF00C0DE

// Finalizer for garbage collection of external pointers to Foo
void finalize_foo(SEXP x)
{
    baz_free_foo(reinterpret_cast<Foo*>(R_ExternalPtrAddr(x)));
    R_ClearExternalPtr(x);
}

// Exported function for users of my R package
// [[Rcpp::export]]
SEXP get_foo()
{
    SEXP tag = PROTECT(Rf_ScalarInteger(FOO_CODE));
    SEXP x = PROTECT(R_MakeExternalPtr(baz_allocate_foo(), tag, R_NilValue));
    R_RegisterCFinalizerEx(x, finalize_foo, TRUE);
    UNPROTECT(2);
    return x;
}

In R code, the user does things like this:

myfoo = bazwrap::get_foo()
bazwrap::say_hello(myfoo, "Alice")
bazwrap::say_goodbye(myfoo, "Bob")

and the intention is that the memory pointed to by myfoo gets freed either when myfoo is garbage collected or before R exits.

So here I am using an external pointer completely on its own, not as part of a list or as an attribute of an object with "normal semantics" as advised by Writing R Extensions. I haven't found any issues with this, even when doing e.g.

library(bazwrap)
myfoo1 = get_foo()
myfoo2 = myfoo1
rm(myfoo2)
gc() # as expected, this does not trigger the finalizer as myfoo1 is still around
say_hello(myfoo1, "Alice") # doesn't crash...

This leads me to my questions at the top of the post.


Solution

  • Here, "unusual copying semantics" just means that duplicate1(s, .) in C [1] returns s and not a copy of s. Such semantics are employed for external pointers and 9 other types; indeed, in duplicate.c we see that duplicate1 does something like:

    switch (TYPEOF(s)) {
    case NILSXP:
    case SYMSXP:
    case ENVSXP:
    case SPECIALSXP:
    case BUILTINSXP:
    case EXTPTRSXP:
    case BCODESXP:
    case WEAKREFSXP:
    case CHARSXP:
    case PROMSXP:
        return s;
    

    The answer to your main question has everything to do with setting of attributes. For s of one of the above 10 types, attr(s, name) <- value either is an error (NILSXP, SYMSXP, CHARSXP) or sets an attribute on s and never a copy of s. Whether s is referenced makes no difference; a copy is never made because duplicate1 is a no-op.

    With that in mind, we can understand attributes on external pointers (EXTPTRSXP) by looking at attributes on environments (ENVSXP), which are an exact analogy with the convenience of an R level API:

    > e1 <- e2 <- new.env()
    > attr(e2, "a")
    NULL
    > attr(e1, "a") <- 0
    > attr(e2, "a")
    [1] 0
    

    That is, setting an attribute on an environment affects all other references to it. The same can be said for external pointers. Hence the advice in the Writing R Extensions manual and in Luke Tierney's original notes is to always place external pointers in a list or pairlist, which duplicate1 does copy:

    > e1 <- e2 <- list(new.env())
    > attr(e2, "a")
    NULL
    > attr(e1, "a") <- 0
    > attr(e2, "a")
    NULL
    

    If you are wondering why it is not an error to set attributes on external pointers, in light of their unusual copying semantics, then don't forget that classed external pointers must have a class attribute. I imagine that R-core wanted to allow classed external pointers, even if its advice is to instead use classed lists or pairlists containing unclassed external pointers.


    1. API functions duplicate and shallow_duplicate are just wrappers for duplicate1.