I am trying to analyse a dataframe where every row represents a timeseries. My df is structured as follows:
df <- data.frame(key = c("10A", "11xy", "445pe"),
Obs1 = c(0, 22, 0),
Obs2 = c(10, 0, 0),
Obs3 = c(0, 3, 5),
Obs4 = c(0, 10, 0)
)
I would now like to create a new dataframe, where every row represents again the key, and the columns consist of the following results:
This means I would like to receive the following dataframe in the end:
key TotalZeros LeadingZeros
10A 3 1
11xy 1 0
445pe 3 2
I managed to count the total number of zeros for each row:
zeroCountDf <- data.frame(key = df$key, TotalNonZeros = rowSums(df ! = 0))
But I am struggling with counting the LeadingZeros
. I found how to count the first non-zero position in a vector, but I don't understand how to apply this approach to my dataframe:
vec <- c(0,1,1)
min(which(vec != 0)) # returns 2, meaning the second position is first nonzero value
Can anyone explain how to count leading zeros for every row in a dataframe? I am new to R and thankful for any insight and tips. Thanks in advance.
We could use rowCumsums
from matrixStats
along with rowSums
library(matrixStats)
cbind(df[1], total_zeros = rowSums(df[-1] == 0),
Leading_zeros = rowSums(!rowCumsums(df[-1] != 0)))
-output
key total_zeros Leading_zeros
1 10A 3 1
2 11xy 1 0
3 445pe 3 2
or in tidyverse, we may also use rowwise
library(dplyr)
df %>%
mutate(total_zeros = rowSums(select(., starts_with("Obs")) == 0)) %>%
rowwise %>%
transmute(key, total_zeros,
Leading_zeros = sum(!cumsum(c_across(starts_with('Obs')) != 0))) %>%
ungroup
-output
# A tibble: 3 x 3
key total_zeros Leading_zeros
<chr> <dbl> <int>
1 10A 3 1
2 11xy 1 0
3 445pe 3 2