Search code examples
rmemorylow-level

Can I build an environment object from a memory address?


I found that we can build/reconstruct an external pointer from a memory address, see this example where I take a pointer from a data table object and rebuild it:

# devtools::install_github("randy3k/xptr")
iris_dt <- data.table::as.data.table(iris)
ptr1 <- attr(iris_dt, ".internal.selfref")
ptr1
#> <pointer: 0x13c00d4e0>
typeof(ptr1)
#> [1] "externalptr"

address <- xptr::xptr_address(ptr1)
address
#> [1] "0x13c00d4e0"
ptr2 <- xptr::new_xptr(address)
identical(ptr1, ptr2)
#> [1] TRUE

Obviously xptr::new_xptr("0x13c00d4e0") is not stable between sessions, I am aware that the above is not allocating memory but merely defining binding, this is fine for my use case.

I want to do the same with environments :

e <- new.env()
e
#> <environment: 0x10b5bf038>

env("0x10b5bf038") # I want this "env" function
#> <environment: 0x10b5bf038>

I doubt base R can do it so I'm opened to packaged options and C magic.

Unrequired reading addressing the X/Y comment

I need this for the {constructive} package, say I want to explore the asNamespace("stats")$.__NAMESPACE__.$DLLs object, the way it prints is not very helpful :

asNamespace("stats")$.__NAMESPACE__.$DLLs
#> $stats
#> DLL name: stats
#> Filename: /opt/R/4.2.1-arm64/Resources/library/stats/libs/stats.so
#> Dynamic lookup: FALSE

dput() is often ugly and brittle, additionally here the code is not syntactic so I cannot be sure that the object is accurately described.

dput(asNamespace("stats")$.__NAMESPACE__.$DLLs)
#> list(stats = structure(list(name = "stats", path = "/opt/R/4.2.1-arm64/Resources/library/stats/libs/stats.so", 
#>     dynamicLookup = FALSE, handle = <pointer: 0x2011ce960>, info = <pointer: 0x6000021f00c0>), class = "DLLInfo"))

str() does somewhat better but not ideal in the general case

str(asNamespace("stats")$.__NAMESPACE__.$DLLs)
#> List of 1
#>  $ stats:List of 5
#>   ..$ name         : chr "stats"
#>   ..$ path         : chr "/opt/R/4.2.1-arm64/Resources/library/stats/libs/stats.so"
#>   ..$ dynamicLookup: logi FALSE
#>   ..$ handle       :Class 'DLLHandle' <externalptr> 
#>   ..$ info         :Class 'DLLInfoReference' <externalptr> 
#>   ..- attr(*, "class")= chr "DLLInfo"

{constructive} guarantees that it outputs code that reproduces the object, and it now works for objects containing pointers.

constructive::construct(asNamespace("stats")$.__NAMESPACE__.$DLLs)
#> list(
#>   stats = list(
#>     name = "stats",
#>     path = "/opt/R/4.2.1-arm64/Resources/library/stats/libs/stats.so",
#>     dynamicLookup = FALSE,
#>     handle = constructive::external_pointer("0x2051d6960") |>
#>       structure(class = "DLLHandle"),
#>     info = constructive::external_pointer("0x600002970de0") |>
#>       structure(class = "DLLInfoReference")
#>   ) |>
#>     structure(class = "DLLInfo")
#> )

I have other ways in the package to handle environments, such as building equivalent environments from lists etc... But I want to integrate an alternative using only the memory address, because it would print nicer and is enough for some use cases.


Solution

  • You'll want to do a bit more work to make this safe and portable, if a safe and portable version exists. Now it returns R_NilValue (NULL) if no environment is found at the address. Note that the integer type uintptr_t and the corresponding macro format specifier SCNxPTR are optional in C99. See the advice in WRE under "Writing portable packages".

    /* objectFromAddress.c */
    
    #include <inttypes.h> /* uintptr_t, SCNxPTR */
    #include <stdio.h> /* sscanf */
    #include <Rinternals.h> /* SEXP, etc. */
    
    SEXP objectFromAddress(SEXP a) {
        uintptr_t p = 0;
        
        if (TYPEOF(a) != STRSXP || XLENGTH(a) != 1 ||
            (a = STRING_ELT(a, 0)) == NA_STRING ||
            sscanf(CHAR(a), "%" SCNxPTR, &p) != 1)
            error("'a' is not a formatted unsigned hexadecimal integer");
        
          SEXP result = (SEXP) p;
          if (TYPEOF(result) != ENVSXP) return R_NilValue;
          return (SEXP) p;
    }
    
    tools::Rcmd(c("SHLIB", "objectFromAddress.c"))
    
    using C compiler: ‘Apple clang version 13.0.0 (clang-1300.0.29.30)’
    using SDK: ‘MacOSX12.1.sdk’
    clang -I"/usr/local/lib/R/include" -DNDEBUG   -I/opt/R/arm64/include -I/usr/local/include    -fPIC  -Wall -g -O2 -pedantic -mmacosx-version-min=11.0 -arch arm64 -falign-functions=64 -Wno-error=implicit-function-declaration -flto=thin -c objectFromAddress.c -o objectFromAddress.o
    clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -Wall -g -O2 -pedantic -mmacosx-version-min=11.0 -arch arm64 -falign-functions=64 -Wno-error=implicit-function-declaration -flto=thin -fPIC -Wl,-mllvm,-threads=4 -L/usr/local/lib/R/lib -L/opt/R/arm64/lib -L/usr/local/lib -o objectFromAddress.so objectFromAddress.o -L/usr/local/lib/R/lib -lR -Wl,-framework -Wl,CoreFoundation
    
    dyn.load("objectFromAddress.so")
    (e <- new.env())
    
    <environment: 0x112811b30>
    
    identical(.Call("objectFromAddress", "112811b30"), e)
    
    [1] TRUE
    

    You'll want to ask:

    • What happens if the memory at the address is freed by the garbage collector before the call to objectFromAddress?

    In practice, you (the maintainer) would need to guarantee from R that this function is not called in either of those cases.