Search code examples
rperformancefor-loopdplyrtibble

Rowwise mutation of tibble without for loop in R


I have two tibbles and need to index the data in one tibble and insert some specific data in the other tibble based on a variable in the first tibble.

I have two tibbles:

library(dplyr)

# Set seed
set.seed(10)

# Get data
df1 <-  starwars %>% 
  select(name,species) %>%
  filter(species %in% c("Human","Wookiee","Droid")) %>%
  mutate(Fav_colour = sample(c("blue","red","green"),n(),replace=TRUE))

# Random table with typical colour preference
df2 <- tibble(Colour = c("blue","green","red"),
                   Human = c(0.5,0.3,0.1),
                   Wookiee = c(0.2,0.8,0.5),
                   Droid = c(0.3,0.1,0.5))

In df1 I need to insert the typical colour preference based on the species. To do this I can iterate through each row of the tibble in a for loop, add the relevant data, then compile into a list.

# Make empty list
data <- list()

# Iterate through each row
for (x in 1:nrow(df1)) {
  # Take a slice
  tmp <- slice(df1, x)
  
  # Add new column to slice based on data in slice (species)
  tmp$Typical <- df2 %>%
    select(Colour,tmp$species) %>%
    arrange(desc(.data[[tmp$species]])) %>% 
    slice_head(n = 1) %>% 
    select(Colour) %>% 
    as.character()
    
  #Add data to list
  data[[x]] <- tmp
}

#Recompile df1
df1 <- list.rbind(data)

I think there must be a more efficient way to do this, but can't figure out how to obtain filtered and arranged values from df2 without putting it through a for loop. I don't know how to do this, but is sapply with a function perhaps better option? What is the dplyr way of doing this without a for loop?


Solution

  • It sounds like you want from df2 the largest value per species. If we pivot_longer to make the species be specified in one column, and the value in another, we can group by species and keep the largest value. This lookup table (with colour + value per species) can be joined to the original data.

    df1 %>%
      left_join(
        df2 %>% 
          tidyr::pivot_longer(2:4, names_to = "species") %>%
          group_by(species) %>%
          slice_max(value)
      )
    

    Result

    Joining with `by = join_by(species)`
    # A tibble: 43 × 5
       name               species Fav_colour Colour value
       <chr>              <chr>   <chr>      <chr>  <dbl>
     1 Luke Skywalker     Human   green      blue     0.5
     2 C-3PO              Droid   blue       red      0.5
     3 R2-D2              Droid   red        red      0.5
     4 Darth Vader        Human   green      blue     0.5
     5 Leia Organa        Human   red        blue     0.5
     6 Owen Lars          Human   green      blue     0.5
     7 Beru Whitesun lars Human   green      blue     0.5
     8 R5-D4              Droid   green      red      0.5
     9 Biggs Darklighter  Human   green      blue     0.5
    10 Obi-Wan Kenobi     Human   green      blue     0.5
    # … with 33 more rows
    # ℹ Use `print(n = ...)` to see more rows