Search code examples
rfor-loopsubset

R: How to run 'for-loop' by factor level?


How does one do 'for-loops' with different variable types? My loop works by simply averaging counts, then increases the sample size as more data gets added; except it doesn't repeat the sequence of steps for each factor level (only 1 of 2). What do I need to add to change that?

Data:

set.seed(1357)

df <- as.data.frame(rnbinom(300, mu = 0.6971, size = 1))
names(df)[1] <- "count"
df$YR <- rep(2018:2020, each=100)
df$season <- rep(c("DRY", "WET"), each=50)
df$season <- as.factor(df$season)

For later use:

substrRight <- function(x, n){
  substr(x, nchar(x)-n+1, nchar(x))
}

listofdfs <- list()

allYR <- unique(df$YR)

For loop:

for (i in allYR) {
for(j in seq_along(levels(df$season))){
    
    sg <- df %>% filter(YR %in% c(i:2020) & season %in% c(levels(season)[j]))
    
    z <- sg$count
    sg$occur <- ifelse(z > 0, 1, 0)
    sg$occur <- as.integer(sg$occur)
    
    sg_summary <- sg %>% 
      distinct %>% 
      arrange(season, YR) %>% 
      group_by(season) %>%
      summarise(YR = str_c(YR, collapse="_"),
                n = n(), 
                mean_occur = round(mean(occur, na.rm = TRUE),4))
    
    sg_summary$YRs_combined <- paste(substring(sg_summary$YR, 1, 4), 
                                       # ^ First 4 characters in a string
                                       substrRight(sg_summary$YR, 4), sep="-")
    
    # Remove YR string
    sg_summary <- sg_summary[,-2]

    listofdfs[[i]] <- sg_summary
    
    x <- bind_rows(listofdfs)
    
  }
}

Result:

> x
# A tibble: 3 × 4
  season     n mean_occur YRs_combined
  <fct>  <int>      <dbl> <chr>       
1 WET       16      0.812 2018-2020   
2 WET       10      0.8   2019-2020   
3 WET        5      0.8   2020-2020 

Intended result: (Note: I think the numbers are wrong here, but the idea is just to make the loop run for each factor level).

# A tibble: 6 × 4
  season     n mean_occur YRs_combined
  <fct>  <int>      <dbl> <chr>       
1 DRY       16      0.812 2018-2020   
2 DRY       10      0.8   2019-2020   
3 DRY        6      0.833 2020-2020   
4 WET       17      0.824 2018-2020   
5 WET       11      0.818 2019-2020   
6 WET        6      0.833 2020-2020  

Solution

  • Your loop logic and factor level handling are fine, but that assignment - listofdfs[[i]] <- sg_summary - keeps only a single entry per year as WET loop overwrites previous DRY loop value for that same year.

    You can actually avoid nested loops here, which conveniently saves you from such indexing issues. First, let's create combinations of year & season values and store those in a 2-column ys_comb data.frame. Next step would be refactoring inner loop content as 2-argument function, get_summary(year_, season_). Now we can iterate through all ys_comb rows with purrr::pmap(), get_summary()gets called with all year-season combinations and result is a list of frames / tibbles.

    library(dplyr)
    library(purrr)
    library(stringr)
    
    df <- as.data.frame(rnbinom(300, mu = 0.6971, size = 1))
    names(df)[1] <- "count"
    df$YR <- rep(2018:2020, each=100)
    df$season <- rep(c("DRY", "WET"), each=50)
    df$season <- as.factor(df$season)
    
    substrRight <- function(x, n){
      substr(x, nchar(x)-n+1, nchar(x))
    }
    
    listofdfs <- list()
    
    allYR <- unique(df$YR)
    
    (ys_comb <- expand.grid(year = allYR, season = levels(df$season)))
    #>   year season
    #> 1 2018    DRY
    #> 2 2019    DRY
    #> 3 2020    DRY
    #> 4 2018    WET
    #> 5 2019    WET
    #> 6 2020    WET
    
    get_summary <- function(year_, season_){
      sg <- df %>% filter(YR %in% c(year_:2020) & season == season_)
      
      z <- sg$count
      sg$occur <- ifelse(z > 0, 1, 0)
      sg$occur <- as.integer(sg$occur)
      
      sg_summary <- sg %>% 
        distinct() %>% 
        arrange(season, YR) %>% 
        group_by(season) %>%
        summarise(YR = str_c(YR, collapse="_"),
                  n = n(), 
                  mean_occur = round(mean(occur, na.rm = TRUE),4))
      
      sg_summary$YRs_combined <- paste(substring(sg_summary$YR, 1, 4), 
                                       # ^ First 4 characters in a string
                                       substrRight(sg_summary$YR, 4), sep="-")
      
      # Remove YR string
      sg_summary[,-2]  
    }
    
    # call get_summary() with every year-season pair in ys_comb
    listofdfs <- pmap(ys_comb, get_summary)
    x <- list_rbind(listofdfs)
    x
    #> # A tibble: 6 × 4
    #>   season     n mean_occur YRs_combined
    #>   <fct>  <int>      <dbl> <chr>       
    #> 1 DRY       15      0.8   2018-2020   
    #> 2 DRY        9      0.778 2019-2020   
    #> 3 DRY        4      0.75  2020-2020   
    #> 4 WET       16      0.812 2018-2020   
    #> 5 WET       10      0.8   2019-2020   
    #> 6 WET        6      0.833 2020-2020
    

    Created on 2024-05-03 with reprex v2.1.0