Search code examples
rdplyrmagrittr

How do I efficiently change the values of a vector in a dataframe based off of a current vector in r?


I am trying to create a new vector "seasons" based off of the months vector already provided in my data. I am using the built-in txhousing dataset; I have already filtered the dataframe to only include information on the city of Houston and have called this new dataframe houston.

I have managed to recategorize the twelve months into four seasons, however, the way I did it is not efficient. Does anyone have any suggestions for how I can optimize this code? Whenever I tried to provide a range of months (e.g. houston[houston$month==(3:5),] %<>% mutate(seasons = "spring") I would get the error "In month == 3:5 : longer object length is not a multiple of shorter object length".

Thank you for any help! -an R newbie

houston[houston$month==(1),] %<>% mutate(seasons = "winter")
houston[houston$month==(2),] %<>% mutate(seasons = "winter")
houston[houston$month==(3),] %<>% mutate(seasons = "spring")
houston[houston$month==(4),] %<>% mutate(seasons = "spring")
houston[houston$month==(5),] %<>% mutate(seasons = "spring")
houston[houston$month==(6),] %<>% mutate(seasons = "summer")
houston[houston$month==(7),] %<>% mutate(seasons = "summer")
houston[houston$month==(8),] %<>% mutate(seasons = "summer")
houston[houston$month==(9),] %<>% mutate(seasons = "summer")
houston[houston$month==(10),] %<>% mutate(seasons = "fall")
houston[houston$month==(11),] %<>% mutate(seasons = "fall")
houston[houston$month==(12),] %<>% mutate(seasons = "winter")

Solution

  • dplyr::case_when provides a clean coding for this.

    library(dplyr)
    
    # Reprex dataframe (always include one in your questions)
    houston <- tibble(month = 1:12)
    
    houston %>%
      mutate(seasons = case_when(month %in% c(1:2, 12) ~ "winter",
                                 month %in% 3:5        ~ "spring",
                                 month %in% 6:9        ~ "summer",
                                 month %in% 10:11      ~ "fall"))
    
    # A tibble: 12 x 2
       month seasons
       <int> <chr>  
     1     1 winter 
     2     2 winter 
     3     3 spring 
     4     4 spring 
     5     5 spring 
     6     6 summer 
     7     7 summer 
     8     8 summer 
     9     9 summer 
    10    10 fall   
    11    11 fall   
    12    12 winter