How does one do 'for-loops' with different variable types? My loop works by simply averaging counts, then increases the sample size as more data gets added; except it doesn't repeat the sequence of steps for each factor level (only 1 of 2). What do I need to add to change that?
Data:
set.seed(1357)
df <- as.data.frame(rnbinom(300, mu = 0.6971, size = 1))
names(df)[1] <- "count"
df$YR <- rep(2018:2020, each=100)
df$season <- rep(c("DRY", "WET"), each=50)
df$season <- as.factor(df$season)
For later use:
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
listofdfs <- list()
allYR <- unique(df$YR)
For loop:
for (i in allYR) {
for(j in seq_along(levels(df$season))){
sg <- df %>% filter(YR %in% c(i:2020) & season %in% c(levels(season)[j]))
z <- sg$count
sg$occur <- ifelse(z > 0, 1, 0)
sg$occur <- as.integer(sg$occur)
sg_summary <- sg %>%
distinct %>%
arrange(season, YR) %>%
group_by(season) %>%
summarise(YR = str_c(YR, collapse="_"),
n = n(),
mean_occur = round(mean(occur, na.rm = TRUE),4))
sg_summary$YRs_combined <- paste(substring(sg_summary$YR, 1, 4),
# ^ First 4 characters in a string
substrRight(sg_summary$YR, 4), sep="-")
# Remove YR string
sg_summary <- sg_summary[,-2]
listofdfs[[i]] <- sg_summary
x <- bind_rows(listofdfs)
}
}
Result:
> x
# A tibble: 3 × 4
season n mean_occur YRs_combined
<fct> <int> <dbl> <chr>
1 WET 16 0.812 2018-2020
2 WET 10 0.8 2019-2020
3 WET 5 0.8 2020-2020
Intended result: (Note: I think the numbers are wrong here, but the idea is just to make the loop run for each factor level).
# A tibble: 6 × 4
season n mean_occur YRs_combined
<fct> <int> <dbl> <chr>
1 DRY 16 0.812 2018-2020
2 DRY 10 0.8 2019-2020
3 DRY 6 0.833 2020-2020
4 WET 17 0.824 2018-2020
5 WET 11 0.818 2019-2020
6 WET 6 0.833 2020-2020
Your loop logic and factor level handling are fine, but that assignment - listofdfs[[i]] <- sg_summary
- keeps only a single entry per year as WET
loop overwrites previous DRY
loop value for that same year.
You can actually avoid nested loops here, which conveniently saves you from such indexing issues. First, let's create combinations of year & season values and store those in a 2-column ys_comb
data.frame. Next step would be refactoring inner loop content as 2-argument function, get_summary(year_, season_)
. Now we can iterate through all ys_comb
rows with purrr::pmap()
, get_summary()
gets called with all year-season combinations and result is a list of frames / tibbles.
library(dplyr)
library(purrr)
library(stringr)
df <- as.data.frame(rnbinom(300, mu = 0.6971, size = 1))
names(df)[1] <- "count"
df$YR <- rep(2018:2020, each=100)
df$season <- rep(c("DRY", "WET"), each=50)
df$season <- as.factor(df$season)
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
listofdfs <- list()
allYR <- unique(df$YR)
(ys_comb <- expand.grid(year = allYR, season = levels(df$season)))
#> year season
#> 1 2018 DRY
#> 2 2019 DRY
#> 3 2020 DRY
#> 4 2018 WET
#> 5 2019 WET
#> 6 2020 WET
get_summary <- function(year_, season_){
sg <- df %>% filter(YR %in% c(year_:2020) & season == season_)
z <- sg$count
sg$occur <- ifelse(z > 0, 1, 0)
sg$occur <- as.integer(sg$occur)
sg_summary <- sg %>%
distinct() %>%
arrange(season, YR) %>%
group_by(season) %>%
summarise(YR = str_c(YR, collapse="_"),
n = n(),
mean_occur = round(mean(occur, na.rm = TRUE),4))
sg_summary$YRs_combined <- paste(substring(sg_summary$YR, 1, 4),
# ^ First 4 characters in a string
substrRight(sg_summary$YR, 4), sep="-")
# Remove YR string
sg_summary[,-2]
}
# call get_summary() with every year-season pair in ys_comb
listofdfs <- pmap(ys_comb, get_summary)
x <- list_rbind(listofdfs)
x
#> # A tibble: 6 × 4
#> season n mean_occur YRs_combined
#> <fct> <int> <dbl> <chr>
#> 1 DRY 15 0.8 2018-2020
#> 2 DRY 9 0.778 2019-2020
#> 3 DRY 4 0.75 2020-2020
#> 4 WET 16 0.812 2018-2020
#> 5 WET 10 0.8 2019-2020
#> 6 WET 6 0.833 2020-2020
Created on 2024-05-03 with reprex v2.1.0