Search code examples
rstringdataframemutate

How to create new dataframe column with a character that comes from three other columns but only if present in a list, in R


I've been trying to do this with mutate(), str_detect(), but it's too complex and dont know how to do all the steps, please help!

I have a dataframe in which 3 cols contain let's say fruits, animals, or "none".

col1 col2 col3
apple cat none
apple dog none
pear none none
pear apple none
none none none

And then I have two lists:

fruit <- c("apple", "pear", banana") animal <- c("cat", "dog", "sheep")

I want to create two new columns in the dataframe: col4 should display only fruits from col1, col2, col3. If more than one fruit, I need them separated by commas. col5 does the same but for animals. If col1, col2, col3 dont contain an animal or a fruit, I need col4 and col5 to say "none".

col1 col2 col3 col4 col5
apple cat none apple cat
cat dog none none cat, dog
pear none none pear none
pear apple none pear, apple none
none none none none none

Solution

  • This gives you what you want:

    df <- data.frame(col1=c("apple", "cat", "pear", "pear","none"),
                     col2=c("cat", "dog", "none", "apple", "none"),
                     col3 = c("none", "none", "none", "none", "none"))
    
    fruit <- c("apple", "pear", "banana")
    animal <- c("cat", "dog", "sheep")
    
    df %>% mutate(col4 = apply(.,1, function(x) paste0(x[x != "none" & x %in% fruit], collapse = ",")),
                  col5 = apply(.,1, function(x) paste0(x[x != "none" & x %in% animal], collapse = ",")),
                  across(.cols = c("col4", "col5"), .fns = ~replace(.x, .x == "", "none")))
    

    output:

      col1  col2 col3       col4    col5
    1 apple   cat none      apple     cat
    2   cat   dog none       none cat,dog
    3  pear  none none       pear    none
    4  pear apple none pear,apple    none
    5  none  none none       none    none
    
    ``