Search code examples
rreplacerecycle

How to avoid recycling while trying to replace values from a vector in a dataframe column


This question arose, while working on this question Replace list names if they exist

I have this manipulated iris dataset with two vectors:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
         <dbl>       <dbl>        <dbl>       <dbl> <chr>     
1          5.1         3.5          1.4         0.2 setosa    
2          4.9         3            1.4         0.2 setosa    
3          7           3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6           2.5 virginica 
6          5.8         2.7          5.1         1.9 virginica

I would like to replace values in Species selected from a vector (to_select) with values from another vector (new_name)

When I do:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species)) %>% 
  mutate(Species = ifelse(Species %in% to_select, new_name, Species))

# I get:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
         <dbl>       <dbl>        <dbl>       <dbl> <chr>        
1          5.1         3.5          1.4         0.2 new_setoas   
2          4.9         3            1.4         0.2 **new_virginica** # should be new_setoas
3          7           3.2          4.7         1.4 versicolor   
4          6.4         3.2          4.5         1.5 versicolor   
5          6.3         3.3          6           2.5 **new_setoas** # should be new_virginica   
6          5.8         2.7          5.1         1.9 new_virginica 

While I know this is happening because of recycling. I don't know how to avoid this!


Solution

  • We may use recode - instead of grouping and then modifying the group column afterwards, it can be done at the group_by step itself

    library(dplyr)
    iris %>% 
      group_by(Species =  recode(as.character(Species),
         !!!setNames(new_name, to_select))) %>% 
      slice(1:2) 
    

    -output

    # A tibble: 6 × 5
    # Groups:   Species [3]
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
             <dbl>       <dbl>        <dbl>       <dbl> <chr>        
    1          5.1         3.5          1.4         0.2 new_setoas   
    2          4.9         3            1.4         0.2 new_setoas   
    3          7           3.2          4.7         1.4 versicolor   
    4          6.4         3.2          4.5         1.5 versicolor   
    5          6.3         3.3          6           2.5 new_virginica
    6          5.8         2.7          5.1         1.9 new_virginica