Search code examples
rdataframetiming

Fastest way to check if dataframe is empty


What is the fastest (every microsecond counts) way to check if a data.frame is empty? I need it in the following context:

if (<df is not empty>) { do something here }

Possible solutions:

1) if(is.empty(df$V1) == FALSE from `spatstat' package

2) if(nrow(df) != 0)

3) Your solution

I could do:

library(microbenchmark)
microbenchmark(is.empty(df),times=100)
Unit: microseconds
         expr min  lq mean median  uq max neval
 is.empty(df) 5.8 5.8  6.9      6 6.2  66   100 

but not sure how to time 2). And what is your solution to empty df?

Thanks!


Solution

  • Suppose we have two types of data.frames:

    emptyDF = data.frame(a=1,b="bah")[0,]
    fullDF  = data.frame(a=1,b="bah")
    
    DFs = list(emptyDF,fullDF)[sample(1:2,1e4,replace=TRUE)]
    

    and your if condition shows up in a loop like

    boundDF = data.frame()
    for (i in seq_along(DFs)){ if (nrow(DFs[[i]]))
      boundDF <- rbind(boundDF,DFs[[i]])
    }
    

    In this case, you're approaching the problem in the wrong way. The if statement is not necessary: do.call(rbind,DFs) or library(data.table); rbindlist(DFs) is faster and clearer.

    Generally, you are looking for improvement to the performance of your code in the wrong place. No matter what operation you're doing inside your loop, the step of checking for non-emptiness of the data.frame is not going to be the part that is taking the most time. While there may be room for optimization in this step, "Premature optimization is the root of all evil" as Donald Knuth said.