I am intrigued by ?typeof
, which mentions values that can be returned. Is there a way to call typeof(something)
and get one of the following?
"promise", "char", "...", "any", "bytecode"
I discovered I can get two of the more exotic types that the help for typeof
considers "unlikely to be seen at the user level", like so:
typeof(new("externalptr"))
# [1] "externalptr"
typeof(rlang::new_weakref(new("externalptr")))
# [1] "weakref"
but is there a way to get the others?
Before we try to get specific responses out of typeof
, let's clarify what the function actually does. This requires a refresher on what a type is in R.
Types
Every R object is represented by a structure in the underlying C code called an SEXP
, which contains a pointer to the actual data. Since there are different types of data structure that the SEXP
could point to, each SEXP
has a field called SEXPTYPE
that tells R what sort of structure the SEXP
is pointing at. This SEXPTYPE
is stored as an integer.
When we call typeof
in R, the integer value of the object's SEXTYPE
is looked up in the type table, which ultimately returns a string to the console to give a human-readable description of the SEXPTYPE
of the object. The type table therefore contains all possible outputs of typeof
.
In this sense, the type of an object in R is the lowest-level description of what sort of object it is.
The SEXPTYPE
values with entries in the type table are as follows:
value | SEXPTYPE | Description | typeof output |
---|---|---|---|
0 | NILSXP | NULL | "NULL" |
1 | SYMSXP | symbols | "symbol" |
2 | LISTSXP | pairlists | "pairlist" |
3 | CLOSXP | closures | "closure" |
4 | ENVSXP | environments | "environment" |
5 | PROMSXP | promises | "promise" |
6 | LANGSXP | language objects | "language" |
7 | SPECIALSXP | special functions | "special" |
8 | BUILTINSXP | builtin functions | "builtin" |
9 | CHARSXP | internal character strings | "char" |
10 | LGLSXP | logical vectors | "logical" |
13 | INTSXP | integer vectors | "integer" |
14 | REALSXP | numeric vectors | "double" |
15 | CPLXSXP | complex vectors | "complex" |
16 | STRSXP | character vectors | "character" |
17 | DOTSXP | dot-dot-dot object | "..." |
18 | ANYSXP | make “any” args work | "any" |
19 | VECSXP | list (generic vector) | "list" |
20 | EXPRSXP | expression vector | "expression" |
21 | BCODESXP | byte code | "bytecode" |
22 | EXTPTRSXP | external pointer | "externalptr" |
23 | WEAKREFSXP | weak reference | "weakref" |
24 | RAWSXP | raw vector | "raw" |
25 | S4SXP | S4 classes not of simple type | "S4" |
It is possible to get an object of each of these types in the console, but as far as I can tell, three of them cannot be obtained in base R alone. These are "char", "any" and "weakref". For these we need to use extra compiled code - either our own little snippets in Rcpp
, or already-available functions in rlang
.
Let's get an example of each valid type in the console.
0: NILSXP
This is just NULL
n <- NULL
typeof(n)
#> [1] "NULL"
1: SYMSXP
This is an unevaluated symbol
. We can get a symbol in several ways in base R, including quote
, substitute
, bquote
and str2lang
s <- quote(x)
typeof(s)
#> [1] "symbol"
2: LISTSXP
Despite the name, this is not used for list
objects, but rather for dotted pairlists, as used in the formals
of functions. Functionally, they are similar to standard lists, but are implemented differently in the underlying C code, and do have some important differences
p <- pairlist(a = 1)
typeof(p)
#> [1] "pairlist"
3: CLOSXP
This is used to store closures, i.e. functions that are written in R code rather than being internal C functions.
f <- function() {}
typeof(f)
#> [1] "closure"
4: ENVSXP
Used to store environments
e <- new.env()
typeof(e)
#> [1] "environment"
5: PROMSXP
In R, a promise is made of two objects: a chunk of unevaluated code, plus a pointer to an environment in which that code should be evaluated. This is very similar to a quosure
in the tidyverse ecosystem, except that one can assign and pass round a quosure quite easily, delaying evaluation until it is required. A promise is more evanescent; it will evaluate as soon as you assign it to a symbol, so to see one in the wild you need to have it contained in a list or assigned to a variable via delayedAssign
Creating one in base R is tricky, but it can be achieved by hijacking the complex assignment mechanism. This is where one creates a function like `foo<-` <- function(x, value)
. The interpreter will allow you to call this function as foo(x) <- value
, but in doing so converts value
to a promise in place of an unevaluated code chunk. This allows us to capture the promise using match.call()
:
`f<-` <- function(x, value) {
list(match.call()$value)
}
x <- 1
f(x) <- "foo"
p <- delayedAssign("p", x[[1]])
p
#> <promise: 0x000002472411f410>
typeof(p)
#> [1] "promise"
6: LANGSXP
This is just an unevaluated chunk of code (though it has been parsed as syntactically correct before being stored). Again, this can be created by quote
or substitute
, but formulas are also stored as language objects:
l <- hello ~ world
typeof(l)
#> [1] "language"
7: SPECIALSXP
This is only used for primitive functions which pass their arguments unevaluated to the internal R machinery:
i <- `if`
typeof(i)
#> [1] "special"
8: BUILTINSXP
Again, this is only used to store the built-in functions, but these differ from "special" functions, in that their arguments are evaluated in R before being passed to the internal code.
b <- `+`
typeof(b)
#> [1] "builtin"
9: CHARSXP
These are not used to store R's familiar character vectors, but instead are a character type used internally by R to store atomic character strings. This allows a cache of reusable strings, and allows character vectors (type STRSXP
) to be more efficient. Note that R does not like dealing with CHARSXP
outside of its internal functions. It will give warnings when you have one in the console, telling you that this type of object cannot have attributes.
Counter-intuitively, this is one of the hardest to make. Perhaps the easiest way is to create a RAWSXP
then change the underlying type in compiled code.
Rcpp::cppFunction("SEXP mkchar(SEXP s) {SET_TYPEOF(s, 9); return s;}")
get_char <- function(){
mkchar(as.raw(c(0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72,
0x6c, 0x64, 0x21, 0x00)))
}
chr <- get_char()
chr
#> <CHARSXP: "Hello World!">
typeof(chr)
#> [1] "char"
10: LGLSXP
These are the commonly-used logical vectors in R.
l <- TRUE
typeof(l)
#> [1] "logical"
If you're wondering about SEXPTYPE 11 and 12, these were previously used decades ago for factors and ordered factors, but are not even defined any more, so there are no hidden legacy types we can pull out of typeof
corresponding to these.
13: INTSXP
R uses different types for integers and double-precision floating point numbers, but will convert from integers to doubles with the slightest provocation. The difference between integers and doubles is to some extent abstracted away from the end-user by the fact that INTSXP
and REALSXP
are lumped together as the mode "numeric"
I <- 1L
typeof(I)
#> [1] "integer"
14: REALSXP
These are the familiar numeric vectors
r <- 1.1
typeof(r)
#> [1] "double"
15: CPLXSXP
Complex numbers have their own storage type and are easy to create. I'm not sure why they need their own storage type, as it seems they could just as easily be implemented in S3. Presumably this is partly historical and partly due to efficiency in interfacing with various math libraries.
C <- 1 + 1i
typeof(C)
#> [1] "complex"
16: STRSXP
This is the familiar character vector.
s <- "Hello world"
typeof(s)
#> [1] "character"
17: DOTSXP
The dots in some function formals that allow for extra arbitrary arguments to be passed are implemented as a pairlist of promises. This has its own storage mode called DOTSXP
Perhaps surprisingly, this can actually be obtained without any compiled code:
d <- (function(...) get("..."))(a = 1)
d
#> <...>
typeof(d)
#> [1] "..."
18: ANYSXP
This isn't really a well-defined storage mode. As far as I can tell, it is used as a stand-in internally, predominantly for the implementation of S4 objects. The console will give you an error if you try to display an object of type "any", but it can be stored and its type reported correctly. I can't see a way to obtain it without just coercing an existing object via compiled code:
Rcpp::cppFunction("SEXP get_any(SEXP s) {SET_TYPEOF(s, 18); return s;}")
a <- get_any(1:5)
a
#> Error: unimplemented type 'any' in 'PrintValueRec'
typeof(a)
#> [1] "any"
19: VECSXP
This is the familiar all-purpose R list
l <- list()
typeof(l)
#> [1] "list"
20: EXPRSXP
Used for (lists of) unevaluated expressions
e <- expression(hello * world)
typeof(e)
#> [1] "expression"
21: BCODESXP
Used for byte code of compiled functions.
b <- .Internal(bodyCode(mean))
typeof(b)
#> [1] "bytecode"
22: EXTPTRSXP
This was already mentioned in the question and is here for completeness
e <- new("externalptr")
typeof(e)
# [1] "externalptr"
23: WEAKREFSXP
This was already mentioned in the question and is here for completeness
w <- rlang::new_weakref(.GlobalEnv)
typeof(w)
#> [1] "weakref"
24: RAWSXP
This is just an array of unsigned 8-bit integers
r <- as.raw(1L)
typeof(r)
#> [1] "raw"
25: S4SXP
This is used for objects made in the native object-oriented S4 system
setClass("R_obj", slots = c(a = "character", b = "numeric"))
s <- new("R_obj", a = "Hello world", b = 1)
typeof(s)
#> [1] "S4"
In addition to these 24 types, there are 3 other SEXPTYPEs defined which do not have names in the type lookup table and therefore can't return unique names from typeof
. These are 30 (NEWSXP), 31 (FREESXP) and 99 (FUNSXP). The first two are used internally for memory management / garbage collection, and should only ever exist for microseconds, and the third is used as a placeholder SEXPTYPE for lumping together closures / builtin functions / special functions when searching for objects of mode function. As far as I can tell, no SEXP ever actually has this SEXTYPE.
I'd be interested to hear whether anyone who has a way of creating a WEAKREFSXP without using rlang / Rcpp. It would also be good to hear about any ways of creating a CHARSXP or ANYSXP without using compiled code (though these seem to be a bit unstable when used in the console however they are produced).
As a final note, the closely related concepts of mode, storage mode and class come up when talking about types. Both mode and storage.mode are essentially aliases for type, as described here:
Storage mode
The call storage.mode(x)
simply calls typeof(x)
and returns it, unless typeof(x)
is "closure", "builtin" or "special", then storage.mode
returns "function". It is therefore just a slight abstraction / simplification of type.
Mode
The call mode(x)
also calls typeof(x)
and simplifies closures / specials / builtins into the single mode "function". In addition, it returns "numeric" for both integer and real number types. It changes "symbol" to "name", and changes "language" to either "call" or "(", depending on whether the language object starts with a parenthesis.
This diagram gives the full mapping between type, storage mode and mode:
Class
You can use R every day without needing to know anything about type, mode and storage mode, but a competent R user needs to know the concept of class. It is an object's class that determines which methods are dispatched on calls to generic functions, and therefore it is class that controls an object's behaviour.
You can set an object's class simply by setting its class attribute:
x <- 1
class(x) <- "foo"
class(x)
#> [1] "foo"
However, every object in R has a class, even objects with no class attribute:
x <- 1:5
class(x)
#> [1] integer
This is because R determines an object's class via the C function do_data_class. If there is a "class" attribute set, then that is the class. If there is no "class" attribute set, then first R will check for a dimension attribute. If there is a non-zero dimension attribute, then the class will be an "array" (though if it has exactly two dimensions it will have the class c("matrix", "array")
). If there is no class or dimension attribute then the type of the object is retrieved. Depending on the type, R will return:
"function"
for closures, builtin or specials"numeric"
for REALSXP
(though surprisingly not for INTSXP
)"name"
for symbolsif
, while
, for
, =
, <-
, (
or {
, then this will be returned. Otherwise "call"
is returned. This is quite an arcane system, which seems to be a way of handling different elements of the language's grammar.typeof
the object.In summary, type is the actual type of the objects stored in memory, and mode is a partial abstraction of the actual type that gives us the familiar names we often think of as "basic types" in R. Storage mode is a close synonym for type with limited usefulness. Class is the most familiar and useful abstraction of data types in R, and if an object doesn't have a specified class, R will assign it an implicit class based on the above rules.