I am intrigued by ?typeof
, which mentions values that can be returned. Is there a way to call typeof(something)
and get one of the following?
"promise", "char", "...", "any", "bytecode"
I discovered I can get two of the more exotic types that the help for typeof
considers "unlikely to be seen at the user level", like so:
typeof(new("externalptr"))
# [1] "externalptr"
typeof(rlang::new_weakref(new("externalptr")))
# [1] "weakref"
but is there a way to get the others?
Before we try to get specific responses out of typeof
, let's clarify what the function actually does. This requires a refresher on what a type is in R.
Types
Every R object is represented by a structure in the underlying C code called an SEXP
, which contains a pointer to the actual data. Since there are different types of data structure that the SEXP
could point to, each SEXP
has a field called SEXPTYPE
that tells R what sort of structure the SEXP
is pointing at. This SEXPTYPE
is stored as an integer.
When we call typeof
in R, the integer value of the object's SEXPTYPE
is looked up in the type table, which ultimately returns a string to the console to give a human-readable description of the SEXPTYPE
of the object. The type table therefore contains all possible outputs of typeof
.
In this sense, the type of an object in R is the lowest-level description of what sort of object it is.
The SEXPTYPE
values with entries in the type table are as follows:
value | SEXPTYPE | Description | typeof output |
---|---|---|---|
0 | NILSXP | NULL | "NULL" |
1 | SYMSXP | symbols | "symbol" |
2 | LISTSXP | pairlists | "pairlist" |
3 | CLOSXP | closures | "closure" |
4 | ENVSXP | environments | "environment" |
5 | PROMSXP | promises | "promise" |
6 | LANGSXP | language objects | "language" |
7 | SPECIALSXP | special functions | "special" |
8 | BUILTINSXP | builtin functions | "builtin" |
9 | CHARSXP | internal character strings | "char" |
10 | LGLSXP | logical vectors | "logical" |
13 | INTSXP | integer vectors | "integer" |
14 | REALSXP | numeric vectors | "double" |
15 | CPLXSXP | complex vectors | "complex" |
16 | STRSXP | character vectors | "character" |
17 | DOTSXP | dot-dot-dot object | "..." |
18 | ANYSXP | make “any” args work | "any" |
19 | VECSXP | list (generic vector) | "list" |
20 | EXPRSXP | expression vector | "expression" |
21 | BCODESXP | byte code | "bytecode" |
22 | EXTPTRSXP | external pointer | "externalptr" |
23 | WEAKREFSXP | weak reference | "weakref" |
24 | RAWSXP | raw vector | "raw" |
25* | OBJSXP, S4SXP | complex objects typically used for S4 classes | "object" or "S4"* |
*Note that since R v 4.4.0, SEXPTYPE 25 can have two different possible output values from typeof
; this is discussed below
It is possible to get an object of each of these types in the console, but as far as I can tell, an object of type "any" cannot be obtained in base R alone and requires a snippet of compiled code. Furthermore, although we can create objects of type "char" and "weakref" in base R by unserializing a vector of bytes, the bytes had to be obtained by reverse-engineering objects produced using Rcpp and rlang. It's therefore possible to get anything other than an ANYSXP object in your own console using base R alone.
Let's get an example of each valid type in the console.
0: NILSXP
This is just NULL
n <- NULL
typeof(n)
#> [1] "NULL"
1: SYMSXP
This is an unevaluated symbol
. We can get a symbol in several ways in base R, including quote
, substitute
, bquote
and str2lang
s <- quote(x)
typeof(s)
#> [1] "symbol"
2: LISTSXP
Despite the name, this is not used for list
objects, but rather for dotted pairlists, as used in the formals
of functions. Functionally, they are similar to standard lists, but are implemented differently in the underlying C code, and do have some important differences
p <- pairlist(a = 1)
typeof(p)
#> [1] "pairlist"
3: CLOSXP
This is used to store closures, i.e. functions that are written in R code rather than being internal C functions.
f <- function() {}
typeof(f)
#> [1] "closure"
4: ENVSXP
Used to store environments
e <- new.env()
typeof(e)
#> [1] "environment"
5: PROMSXP
In R, a promise is made of two objects: a chunk of unevaluated code, plus a pointer to an environment in which that code should be evaluated. This is very similar to a quosure
in the tidyverse ecosystem, except that one can assign and pass round a quosure quite easily, delaying evaluation until it is required. A promise is more evanescent; it will evaluate as soon as you assign it to a symbol, so to see one in the wild you normally need to have it contained in a list.
Creating one in base R is tricky, but it can be achieved by hijacking the complex assignment mechanism. This is where one creates a function like `foo<-` <- function(x, value)
. The interpreter will allow you to call this function as foo(x) <- value
, but in doing so covertly changes value
to a promise in place of an unevaluated code chunk in the underlying C code. This allows us to capture a promise object using match.call()
:
`f<-` <- function(x, value) {
list(match.call()$value)
}
x <- 1
f(x) <- "foo"
x
#> [[1]]
#> <promise: 0x0000021075a9ec10>
typeof(x[[1]])
#> [1] "promise"
To get this out of a list and into a variable, we can use delayedAssign
delayedAssign("p", x[[1]])
p
#> <promise: 0x000002472411f410>
typeof(p)
#> [1] "promise"
eval(p)
#> [1] "foo"
6: LANGSXP
This is just an unevaluated chunk of code (though it has been parsed as syntactically correct before being stored). Again, this can be created by quote
or substitute
, but formulas are also stored as language objects:
l <- hello ~ world
typeof(l)
#> [1] "language"
7: SPECIALSXP
This is only used for primitive functions which pass their arguments unevaluated to the internal R machinery:
i <- `if`
typeof(i)
#> [1] "special"
8: BUILTINSXP
Again, this is only used to store the built-in functions, but these differ from "special" functions, in that their arguments are evaluated in R before being passed to the internal code.
b <- `+`
typeof(b)
#> [1] "builtin"
9: CHARSXP
These are not used to store R's familiar character vectors, but instead are a character type used internally by R to store atomic character strings. This allows a cache of reusable strings, and allows character vectors (type STRSXP
) to be more efficient. Note that R does not like dealing with CHARSXP
outside of its internal functions. It will give warnings when you have one in the console, telling you that this type of object cannot have attributes.
Counter-intuitively, this is one of the hardest to make. Perhaps the easiest way is to create a RAWSXP
then change the underlying type in compiled code.
Rcpp::cppFunction("SEXP mkchar(SEXP s) {SET_TYPEOF(s, 9); return s;}")
get_char <- function(){
mkchar(as.raw(c(0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72,
0x6c, 0x64, 0x21, 0x00)))
}
chr <- get_char()
chr
#> <CHARSXP: "Hello World!">
typeof(chr)
#> [1] "char"
By reverse-engineering the above object, it is also possible to create a CHARSXP using only base R. If we serialize the above chr
object, we can get the bytes that when unserialized will recreate it, so without using any compiled code, we can do
chr <- unserialize(
as.raw(
c(88, 10, 0, 0, 0, 3, 0, 4, 1, 0, 0, 3, 5, 0, 0, 0, 0, 6, 67,
80, 49, 50, 53, 50, 0, 4, 0, 9, 0, 0, 0, 12, 72, 101, 108, 108,
111, 32, 87, 111, 114, 108, 100, 33)
)
)
chr
#> <CHARSXP: "Hello World!">
Note though that I can't find any guarantee that this will continue to work in future versions of R.
10: LGLSXP
These are the commonly-used logical vectors in R.
l <- TRUE
typeof(l)
#> [1] "logical"
If you're wondering about SEXPTYPE 11 and 12, these were previously used decades ago for factors and ordered factors, but are not even defined any more, so there are no hidden legacy types we can pull out of typeof
corresponding to these.
13: INTSXP
R uses different types for integers and double-precision floating point numbers, but will convert from integers to doubles with the slightest provocation. The difference between integers and doubles is to some extent abstracted away from the end-user by the fact that INTSXP
and REALSXP
are lumped together as the mode "numeric"
I <- 1L
typeof(I)
#> [1] "integer"
14: REALSXP
These are the familiar numeric vectors
r <- 1.1
typeof(r)
#> [1] "double"
15: CPLXSXP
Complex numbers have their own storage type and are easy to create. I'm not sure why they need their own storage type, as it seems they could just as easily be implemented in S3. Presumably this is partly historical and partly due to efficiency in interfacing with various math libraries.
C <- 1 + 1i
typeof(C)
#> [1] "complex"
16: STRSXP
This is the familiar character vector.
s <- "Hello world"
typeof(s)
#> [1] "character"
17: DOTSXP
The dots in some function formals that allow for extra arbitrary arguments to be passed are implemented as a pairlist of promises. This has its own storage mode called DOTSXP
Perhaps surprisingly, this can actually be obtained without any compiled code:
d <- (function(...) get("..."))(a = 1)
d
#> <...>
typeof(d)
#> [1] "..."
18: ANYSXP
This isn't really a well-defined type. As far as I can tell, it is used as a stand-in internally, predominantly for the implementation of S4 objects. The console will give you an error if you try to display an object of type "any", but it can be stored and its type reported correctly. I can't see a way to obtain it without just coercing an existing object via compiled code:
Rcpp::cppFunction("SEXP get_any(SEXP s) {SET_TYPEOF(s, 18); return s;}")
a <- get_any(1:5)
a
#> Error: unimplemented type 'any' in 'PrintValueRec'
typeof(a)
#> [1] "any"
Unlike CHRSXP, we cannot reverse-engineer an object of type ANYSXP because is not handled by the serialize / unserialize mechanism, causing an error to be thrown by the WriteItem
function in the C code that serializes R objects when we try. Something tells me that the R developers don't want users to have anything to do with ANYSXP. Even for writers of extensions with compiled code there seems little use for it; Rcpp gets by without ever using it internally, and though rlang defines an R_TYPE_any
in its r_type enum, this specific type is never mentioned again anywhere else in the package.
19: VECSXP
This is the familiar all-purpose R list
l <- list()
typeof(l)
#> [1] "list"
20: EXPRSXP
Used for (lists of) unevaluated expressions
e <- expression(hello * world)
typeof(e)
#> [1] "expression"
21: BCODESXP
Used for byte code of compiled functions.
b <- .Internal(bodyCode(mean))
typeof(b)
#> [1] "bytecode"
22: EXTPTRSXP
This was already mentioned in the question and is here for completeness
e <- new("externalptr")
typeof(e)
# [1] "externalptr"
23: WEAKREFSXP
This was already mentioned in the question and is here for completeness
w <- rlang::new_weakref(.GlobalEnv)
typeof(w)
#> [1] "weakref"
If we want to create one in base R, we can use the reverse-engineering trick we used for CHARSXP:
w <- unserialize(
as.raw(
c(88, 10, 0, 0, 0, 3, 0, 4, 1, 0, 0, 3, 5, 0, 0, 0, 0, 6, 67,
80, 49, 50, 53, 50, 0, 0, 0, 23)
)
)
typeof(w)
#> [1] "weakref"
24: RAWSXP
This is just an array of unsigned 8-bit integers
r <- as.raw(1L)
typeof(r)
#> [1] "raw"
25: S4SXP / OBJSXP
Uniquely, the type table contains two entries equating to SEXPTYPE 25.
The underlying data structure is the same. It is used for complex objects that are similar in some ways to a VECSXP but contain metadata that allows for type-checking of members known as slots.
Prior to R version 4.4.0, calling typeof
on an object with SEXTYPE 25 would always return "S4", since this data type was originally developed for use in R's native object-oriented S4 system.
Since R v4.4.0, this behviour has changed to allow for the development of alternative object-oriented systems in R such as S7. By default, typeof
will now return "object" for SEXPTYPE 25 unless it additionally has an S4 flag bit set in its header.
It is straightforward to produce an object of type "S4" by declaring a new S4 class:
setClass("R_obj", slots = c(a = "character", b = "numeric"))
s <- new("R_obj", a = "Hello world", b = 1)
typeof(s)
#> [1] "S4"
If we want to get an "object" from typeof
in base R, we can simply turn off the S4 flag:
o <- asS3(s, complete = FALSE)
o
#> <object>
#> attr(,"a")
#> [1] "Hello world"
#> attr(,"b")
#> [1] 1
#> attr(,"class")
#> [1] "R_obj"
#> attr(,"class")attr(,"package")
#> [1] ".GlobalEnv"
typeof(o)
#> [1] "object"
In addition to these 24 types, there are 3 other SEXPTYPEs defined which do not have names in the type lookup table and therefore can't return unique names from typeof
. These are 30 (NEWSXP), 31 (FREESXP) and 99 (FUNSXP). The first two are used internally for memory management / garbage collection, and should only ever exist for microseconds, and the third is used as a placeholder SEXPTYPE for lumping together closures / builtin functions / special functions when searching for objects of mode function. As far as I can tell, no SEXP ever actually has this SEXTYPE.
I'd be interested to hear whether anyone has a way of creating an ANYSXP without using compiled code (though this seems to be a bit unstable when used in the console however it is produced). It would also be good to see ways of creating a "weakref" or "char" using only base R without the unserialize
trick (which feels a bit like cheating).
As a final note, the closely related concepts of mode, storage mode and class come up when talking about types. Both mode and storage.mode are essentially aliases for type, as described here:
Storage mode
The call storage.mode(x)
simply calls typeof(x)
and returns it, unless typeof(x)
is "closure", "builtin" or "special", then storage.mode
returns "function". It is therefore just a slight abstraction / simplification of type.
Mode
The call mode(x)
also calls typeof(x)
and simplifies closures / specials / builtins into the single mode "function". In addition, it returns "numeric" for both integer and real number types. It changes "symbol" to "name", and changes "language" to either "call" or "(", depending on whether the language object starts with a parenthesis.
This diagram gives the full mapping between type, storage mode and mode:
Class
You can use R every day without needing to know anything about type, mode and storage mode, but a competent R user needs to know the concept of class. It is an object's class that determines which methods are dispatched on calls to generic functions, and therefore it is class that controls an object's behaviour.
You can set an object's class simply by setting its class attribute:
x <- 1
class(x) <- "foo"
class(x)
#> [1] "foo"
However, every object in R has a class, even objects with no class attribute:
x <- 1:5
class(x)
#> [1] integer
This is because R determines an object's class via the C function do_data_class. If there is a "class" attribute set, then that is the class. If there is no "class" attribute set, then first R will check for a dimension attribute. If there is a non-zero dimension attribute, then the class will be an "array" (though if it has exactly two dimensions it will have the class c("matrix", "array")
). If there is no class or dimension attribute then the type of the object is retrieved. Depending on the type, R will return:
"function"
for closures, builtin or specials"numeric"
for REALSXP
(though surprisingly not for INTSXP
)"name"
for symbolsif
, while
, for
, =
, <-
, (
or {
, then this will be returned. Otherwise "call"
is returned. This is quite an arcane system, which seems to be a way of handling different elements of the language's grammar.typeof
the object.In summary, type is the actual type of the objects stored in memory, and mode is a partial abstraction of the actual type that gives us the familiar names we often think of as "basic types" in R. Storage mode is a close synonym for type with limited usefulness. Class is the most familiar and useful abstraction of data types in R, and if an object doesn't have a specified class, R will assign it an implicit class based on the above rules.