In the following example, my goal is to show years in which consecutive numbers in df
transposed to df_new
are lower than threshold
of -1.2
in 5
consecutive instances. I want to then return corresponding unique values from column df_new$year
as result. My problem linking rle()
function result is that the length does not correspond to df_new$year
length and thus I can't index it properly. The issue with rle()
function is that it does not return zero's and thus it returns only runs of at least 1 value below the threshold
in k
. How can I improve this bit of code to achieve what I need? Is there a way to force rle() to include zero's in k
or shall I go with another approach?
# Example reproducible df:
set.seed(125)
df <- data.frame(V1=rnorm(10,-1.5,.5),
V2=rnorm(10,-1.5,.5),
V3=rnorm(10,-1.5,.5),
V4=rnorm(10,-1.5,.5),
V5=rnorm(10,-1.5,.5),
V6=rnorm(10,-1.5,.5),
V7=rnorm(10,-1.5,.5),
V8=rnorm(10,-1.5,.5),
V9=rnorm(10,-1.5,.5),
V10=rnorm(10,-1.5,.5))
library(data.table)
df_t <- t(df)
df_long <- melt(df_t)
df_long$year <- rep(1976:1985, each=nrow(df))
df_new <- data.frame(value=df_long$value,year=df_long$year)
# Threshold values:
threshold = -1.2
consecutiveentries = 5
number <- consecutiveentries-1
# Start of the problem:
k <- rle(df_new$value < threshold)
years <- unique(df_new$year[k$lengths > number])
Current result:
> years
[1] 1976 1978 1979 1980 1982 1984 1985
What I would like it to be:
> years
[1] 1976 1980 1983 1985
This is ugly but it works :)
df_new$year[cumsum(k$lengths)[which(k$lengths >= 5)-1]+1]
Each part:
idx <- which(k$lengths >= 5)-1
gives you the indices of k$lengths
right before a value is higher or equal to 4.
With cumsum(k$lengths)
we then build the cumulated sum over k$lengths
and take the elements at idx
. As a result we have the number of rows that occur before the first row that is part of a >=5
sequence.
Adding 1 to this result gives us the indices of the rows where each sequence begins.