Search code examples
rdataframezero

counting leading & trailing zeros for every row in a dataframe in R


I am trying to analyse a dataframe where every row represents a timeseries. My df is structured as follows:

df <- data.frame(key = c("10A", "11xy", "445pe"), 
                 Obs1 = c(0, 22, 0),
                 Obs2 = c(10, 0, 0),
                 Obs3 = c(0,  3, 5),
                 Obs4 = c(0, 10, 0)
)

I would now like to create a new dataframe, where every row represents again the key, and the columns consist of the following results:

  1. "TotalZeros": counts the total number of zeros for each row (=key)
  2. "LeadingZeros": counts the number of zeros before the first nonzero obs for each row

This means I would like to receive the following dataframe in the end:

key   TotalZeros   LeadingZeros
10A            3              1
11xy           1              0
445pe          3              2

I managed to count the total number of zeros for each row:

zeroCountDf <- data.frame(key = df$key, TotalNonZeros = rowSums(df ! = 0))

But I am struggling with counting the LeadingZeros. I found how to count the first non-zero position in a vector, but I don't understand how to apply this approach to my dataframe:

vec <- c(0,1,1)
min(which(vec != 0)) # returns 2, meaning the second position is first nonzero value

Can anyone explain how to count leading zeros for every row in a dataframe? I am new to R and thankful for any insight and tips. Thanks in advance.


Solution

  • We could use rowCumsums from matrixStats along with rowSums

    library(matrixStats)
    cbind(df[1], total_zeros = rowSums(df[-1] == 0), 
         Leading_zeros = rowSums(!rowCumsums(df[-1] != 0)))
    

    -output

         key total_zeros Leading_zeros
    1   10A           3              1
    2  11xy           1              0
    3 445pe           3              2
    

    or in tidyverse, we may also use rowwise

    library(dplyr)
    df %>% 
       mutate(total_zeros = rowSums(select(., starts_with("Obs")) == 0)) %>%
       rowwise %>%
       transmute(key, total_zeros,
           Leading_zeros = sum(!cumsum(c_across(starts_with('Obs')) != 0))) %>%
          ungroup
    

    -output

    # A tibble: 3 x 3
      key   total_zeros Leading_zeros
      <chr>       <dbl>         <int>
    1 10A             3             1
    2 11xy            1             0
    3 445pe           3             2