Search code examples
rdata.tabler6

Sub-assign by reference on vector in R


Can I use sub-assign by reference on atomic vectors somehow?
Of course without wrapping it in 1 column data.table to use :=.

library(data.table)
N <- 5e7
x <- sample(letters, N, TRUE)
X <- data.table(x = x)
upd_i <- sample(N, 1L, FALSE)
system.time(x[upd_i] <- NA_character_)
#    user  system elapsed 
#    0.11    0.06    0.17 
system.time(X[upd_i, x := NA_character_])
#    user  system elapsed 
#    0.00    0.00    0.03 

If R6 can help on that I'm open for R6 solution as it is one of my dep already.
I've already checked that <- inside R6 object still makes copy: gist.


Solution

  • In most recent R versions (3.1-3.1.2+ or so), assignment to a vector does not copy. You will not see that by running OP's code though, and the reason for that is the following. Because you reuse x and assign it to some other object, R is not notified that x is copied at that point, and has to assume that it won't be (in the particular case above, I think it'll be good to change it in data.table::data.table and notify R that a copy has been made, but that's a separate issue - data.frame suffers from same issue), and because of that it copies x on first use. If you change the order of the commands a bit, you'd see no difference:

    N <- 5e7
    x <- sample(letters, N, TRUE)
    upd_i <- sample(N, 1L, FALSE)
    # no copy here:
    system.time(x[upd_i] <- NA_character_)
    #   user  system elapsed 
    #      0       0       0 
    X <- data.table(x = x)
    system.time(X[upd_i, x := NA_character_])
    #   user  system elapsed 
    #      0       0       0 
    
    # but now R will copy:
    system.time(x[upd_i] <- NA_character_)
    #   user  system elapsed 
    #   0.28    0.08    0.36 
    

    (old answer, mostly left as a curiosity)

    You actually can use the data.table := operator to modify your vector in place (I think you need R version 3.1+ to avoid the copy in list):

    modify.vector = function (v, idx, value) setDT(list(v))[idx, V1 := value]
    
    v = 1:5
    address(v)
    #[1] "000000002CC7AC48"
    
    modify.vector(v, 4, 10)
    v
    #[1]  1  2  3 10  5
    
    address(v)
    #[1] "000000002CC7AC48"