I have a following code:
for(i in 1:length(hh_temp)){
hh_temp_save = hh_temp[[i]]
for(j in 4:nrow(hh_temp_save)){
hh_temp_save$max_min_sum_5days[j] = ifelse(sum(hh_temp_save$max_min_sum[(j-4):j])>2,1,0)
hh_temp[[i]] = hh_temp_save
}
}
where hh_temp
is a list with length(hh_temp) = 12, each element in hh_temp
is a dataframe.
I tried to convert the for-loop into nested apply but I found that
lapply(hh_temp,\(x){
x = lapply(32:nrow(x),\(y){
x$max_min_sum_5days[y] = ifelse(sum(x$max_min_sum[(y-4):y])>2,1,0)
x
})
return(x)
})
I can only return the manipulated vector instead of the whole dataset. Is there any way to return the whole dataset? Does it mean nested lapply
is not suitable for manipulating single element in a vector?
I am sorry that I cannot provide the detail of dataset, some descriptive statistics can be provided:
> str(hh_temp)
List of 12
$ : tibble [3,684 × 36] (S3: tbl_df/tbl/data.frame)
..$ max_min_sum : num [1:3684] 0 0 0 0 0 0 0 0 0 0 ...
..$ max_min_sum_5days : num [1:3684] NA NA NA NA NA NA NA NA NA NA ...
$ : tibble [3,684 × 36] (S3: tbl_df/tbl/data.frame)
..$ max_min_sum : num [1:3684] 0 0 0 0 0 0 0 0 0 0 ...
..$ max_min_sum_5days : num [1:3684] NA NA NA NA NA NA NA NA NA NA ...
#repeated for 12 times
#max_min_sum is a binary variable
Sample data:
df = data.frame(a = as.factor(c(1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,0,1)),
b = rep(NA,18))
sample_list = list(df,df,df,df,df,df)
My expected outcome is to calculate the cumulative sum of the 5 consecutive elements in a and then if the consecutive sum is greater than 2, the corresponding element in b will be recoded as 1, otherwise 0.
a | b |
---|---|
1 | NA |
1 | NA |
1 | NA |
1 | NA |
0 | 1 |
0 | 1 |
0 | 0 |
In the 5th element in a,
since there are 4 1s
and 1 0
, therefore, the consecutive sum is greater than 2, the corresponding element in b will be recoded as 1
.
My expected outcome is to calculate the cumulative sum of the 5 consecutive elements in a and then if the consecutive sum is greater than 2, the corresponding element in b will be recoded as 1, otherwise 0.
If a
is a factor variable, we need to run as.numeric(as.character(a))
beforehand to coerce a
to numeric. We can use rollsum()
from {zoo}
for the rolling sum calulation.
A solution using lapply()
applied to slightly modified sample data.
Code
# sample_list =
lapply(sample_list,
\(x) { x$b = ifelse(
zoo::rollsum(as.numeric(as.character(x$a)),
k = 5, fill = NA, align = "right") > 2L, 1L, 0L)
x})
Or in a more compact way as suggested by @G. Grothendieck like
lapply(sample_list, transform,
b = +(zoo::rollsumr(as.numeric(as.character(a)), k = 5L, fill = NA) > 2L))
Result
#> [[1]]
#> a b
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 0 1
#> 6 0 1
#> 7 0 0
#> 8 0 0
#> 9 1 0
#> 10 1 0
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 0 1
#> 15 0 1
#> 16 1 1
#> 17 0 0
#> 18 1 0
#>
#> [[2]]
#> a b
#> 1 1 NA
#> 2 2 NA
#> 3 3 NA
#> 4 4 NA
#> 5 5 1
#> 6 6 1
#> 7 7 1
#> 8 8 1
#> 9 1 1
#> 10 1 1
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 1 1
#> 15 1 1
#> 16 1 1
#> 17 1 1
#> 18 1 1
#>
#> [[3]]
#> a b
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 0 1
#> 6 0 1
#> 7 0 0
#> 8 0 0
#> 9 1 0
#> 10 1 0
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 0 1
#> 15 0 1
#> 16 1 1
#> 17 0 0
#> 18 1 0
#>
#> [[4]]
#> a b
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 0 1
#> 6 0 1
#> 7 0 0
#> 8 0 0
#> 9 1 0
#> 10 1 0
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 0 1
#> 15 0 1
#> 16 1 1
#> 17 0 0
#> 18 1 0
#>
#> [[5]]
#> a b
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 0 1
#> 6 0 1
#> 7 0 0
#> 8 0 0
#> 9 1 0
#> 10 1 0
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 0 1
#> 15 0 1
#> 16 1 1
#> 17 0 0
#> 18 1 0
#>
#> [[6]]
#> a b
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 0 1
#> 6 0 1
#> 7 0 0
#> 8 0 0
#> 9 1 0
#> 10 1 0
#> 11 1 1
#> 12 1 1
#> 13 1 1
#> 14 0 1
#> 15 0 1
#> 16 1 1
#> 17 0 0
#> 18 1 0
Modified Data
df = data.frame(a = as.factor(c(1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,0,1)),
b = rep(NA,18))
df2 = data.frame(a = as.factor(c(1:8, rep(1, 5), rep(1,5))),
b = rep(NA,18))
sample_list = list(df,df2,df,df,df,df)
Created on 2023-12-08 with reprex v2.0.2
Edit
If your data is rather small and you do not want to rely on an external package like {zoo}
, you might consider to write your own
rolling sum function. Very basic example:
basic_rollsum = \(x, k) {
# stopifnot(is.numeric(x), is.integer(k))
res = rep(NA, length(x))
for (i in seq_along(x))
# adjust indexing if needed
# look at na.rm-argument of sum
if (i > k) res[i] = sum( x[(i-k+1L):(i)] )
res
}
lapply(sample_list,
\(x) { x$b = ifelse(
basic_rollsum(as.numeric(as.character(x$a)), k = 5L) > 2L, 1L, 0L)
x})