Search code examples
rimmutability

Immutable Objects in R - what happens here under the hood?


I am a little bit confused. I learned to never grow a vector because we do not want to create a new copy of an object every time.

# Bad
start <- Sys.time()

vector1 <- vector()

for(i in 1:100000000) {
  vector1[i] <- i
}

end <- Sys.time()

print(end - start)
Time difference of 17.06454 secs  



# Good
vector2 <- vector(length = 100000000)

start <- Sys.time()

for(i in 1:100000000) {
  vector2[i] <- i
}

end <- Sys.time()

print(end - start)
Time difference of 4.50159 secs

The results tend to prove this correct, however, for example I read here: http://adv-r.had.co.nz/Functional-programming.html something like this: "Mutable state is normally hard because every time it looks like you’re modifying an object, you’re actually creating and then modifying a copy."

So am I not creating a copy everytime I store a new value inside the vector in example 2? Should this not normally not be even slower, because each iteration a vector of size 100.000.000 gets copied?

What do I not understand here?


Solution

  • The section you were reading was in the context of a function. If you check out the section on memory, you'll see

    What happens to x in the following code?

    x <- 1:10
    x[5] <- 10
    

    There are two possibilities:

    1. R modifies x in place.

    2. R makes a copy of x to a new location, modifies the copy, and then uses the name x to point to the new location.

    It turns out that R can do either depending on the circumstances. In the example above, it will modify in place.

    So, for what you're doing, you are modifying in place. You would not be modifying in place for something like this:

    f <- function(vec) {
    
        for(i in 1:length(vec)) {
            vec[i] <- i
        }
    
        return(vec)
    }
    

    You would be creating a local (to the function) copy of vec before modying the copy in place. That's what the Hadley quote was talking about.