Search code examples
rdplyrstringr

use named vector to create column in pipe chain


In a pipe chain, I want to use a named vector to create a new column which matches the names of the vector with the string of a column:

library(tidyverse)
df <- data.frame(my_label = c("car", "house", "Bike", "ca"),
                 xx = c(1, 2, 3, 5))
#   my_label xx
# 1      car  1
# 2    house  2
# 3     Bike  3
# 4       ca  5

named_vars <- c(
  "car" = "Nice car",
  "ca" = "Cat",
  "house" = "Large house")
#           car            ca         house 
#    "Nice car"         "Cat" "Large house"

The following code works if the named vector contains all the strings within the column, which it doesn't in this case so it returns an NA (if it is missing I want to keep the original (Bike in this example):

df %>% 
  mutate(new = named_vars[my_label])
#   my_label xx        new
# 1      car  1    Nice car
# 2    house  2 Large house
# 3     Bike  3        <NA>
# 4       ca  5         Cat

As a workaround, this produces the output I want:

df %>% 
  mutate(new = ifelse(my_label %in% names(named_vars), named_vars[my_label], my_label))
#   my_label xx         new
# 1      car  1    Nice car
# 2    house  2 Large house
# 3     Bike  3        Bike
# 4       ca  5         Cat

I am wondering is there a shorter way to write this?

I tried stringr::str_replace_all but it combines ca and car to give incorrect output (Nice Catr instead of Nice car):

library(stringr)
df %>% 
  mutate(new = str_replace_all(my_label, named_vars))
#   my_label xx         new
# 1      car  1   Nice Catr
# 2    house  2 Large house
# 3     Bike  3        Bike
# 4       ca  5         Cat

Any suggestions? thanks


Solution

  • Use coalesce:

    df %>%
      mutate(new = coalesce(named_vars[my_label], my_label))
    #   my_label xx         new
    # 1      car  1    Nice car
    # 2    house  2 Large house
    # 3     Bike  3        Bike
    # 4       ca  5         Cat
    

    These two statements are functionality equivalent here:

    coalesce(named_vars[my_label], my_label)
    if_else(is.na(named_vars[my_label]), my_label, named_vars[my_label])