Search code examples
rmergeappenddataframerows

How to append rows to an R data frame


I have looked around StackOverflow, but I cannot find a solution specific to my problem, which involves appending rows to an R data frame.

I am initializing an empty 2-column data frame, as follows.

df = data.frame(x = numeric(), y = character())

Then, my goal is to iterate through a list of values and, in each iteration, append a value to the end of the list. I started with the following code.

for (i in 1:10) {
    df$x = rbind(df$x, i)
    df$y = rbind(df$y, toString(i))
}

I also attempted the functions c, append, and merge without success. Please let me know if you have any suggestions.

Update from comment: I don't presume to know how R was meant to be used, but I wanted to ignore the additional line of code that would be required to update the indices on every iteration and I cannot easily preallocate the size of the data frame because I don't know how many rows it will ultimately take. Remember that the above is merely a toy example meant to be reproducible. Either way, thanks for your suggestion!


Solution

  • Update

    Not knowing what you are trying to do, I'll share one more suggestion: Preallocate vectors of the type you want for each column, insert values into those vectors, and then, at the end, create your data.frame.

    Continuing with Julian's f3 (a preallocated data.frame) as the fastest option so far, defined as:

    # pre-allocate space
    f3 <- function(n){
      df <- data.frame(x = numeric(n), y = character(n), stringsAsFactors = FALSE)
      for(i in 1:n){
        df$x[i] <- i
        df$y[i] <- toString(i)
      }
      df
    }
    

    Here's a similar approach, but one where the data.frame is created as the last step.

    # Use preallocated vectors
    f4 <- function(n) {
      x <- numeric(n)
      y <- character(n)
      for (i in 1:n) {
        x[i] <- i
        y[i] <- i
      }
      data.frame(x, y, stringsAsFactors=FALSE)
    }
    

    microbenchmark from the "microbenchmark" package will give us more comprehensive insight than system.time:

    library(microbenchmark)
    microbenchmark(f1(1000), f3(1000), f4(1000), times = 5)
    # Unit: milliseconds
    #      expr         min          lq      median         uq         max neval
    #  f1(1000) 1024.539618 1029.693877 1045.972666 1055.25931 1112.769176     5
    #  f3(1000)  149.417636  150.529011  150.827393  151.02230  160.637845     5
    #  f4(1000)    7.872647    7.892395    7.901151    7.95077    8.049581     5
    

    f1() (the approach below) is incredibly inefficient because of how often it calls data.frame and because growing objects that way is generally slow in R. f3() is much improved due to preallocation, but the data.frame structure itself might be part of the bottleneck here. f4() tries to bypass that bottleneck without compromising the approach you want to take.


    Original answer

    This is really not a good idea, but if you wanted to do it this way, I guess you can try:

    for (i in 1:10) {
      df <- rbind(df, data.frame(x = i, y = toString(i)))
    }
    

    Note that in your code, there is one other problem:

    • You should use stringsAsFactors if you want the characters to not get converted to factors. Use: df = data.frame(x = numeric(), y = character(), stringsAsFactors = FALSE)