Search code examples
rapplynasummary

Get summaries of repeated consecutive values by row in R


I´m trying to get some statistics (min, max, mean) of repeated values by row in R.

My dataframe looks similar to this:

b <- as.data.frame(matrix(ncol=7, nrow=3, 
     c(3,NA,NA,4,5,NA,7,6,NA,7,NA,8,9,NA,NA,4,6,NA,NA,7,NA), byrow = TRUE))

For each row, I want to add a column with the min, max and mean of the no. of columns containing consecutive NAs and it should something like this

  V1 V2 V3 V4 V5 V6 V7 max min mean
1  3 NA NA  4  5 NA  7   2   1  1.5
2  6 NA  7 NA  8  9 NA   1   1  1.0
3 NA  4  6 NA NA  7 NA   2   1  1.33

This is just a small example of my dataset with 2000 rows and 48 columns.

Does anyone have some code for this?


Solution

  • You can apply over the rows and get the "runs" of non-NA columns. Once you have that, you can simply take the summary stats of those:

    b[,c("mean", "max", "min")] <- do.call(rbind, apply(b, 1, function(x){
                                                          res <- rle(!is.na(x))
                                                          res2 <- res[["lengths"]][res[["values"]]]
                                                          data.frame(mean = mean(res2), max = max(res2), min = min(res2))
                                                        }
                                                          ))
    
     b
    #  V1 V2 V3 V4 V5 V6 V7     mean max min
    #1  3 NA NA  4  5 NA  7 1.333333   2   1
    #2  6 NA  7 NA  8  9 NA 1.333333   2   1
    #3 NA  4  6 NA NA  7 NA 1.500000   2   1