Search code examples
rconditional-statementsmultiple-columns

Add column to dataset


I have a dataset with 2 columns: a name of a director and a certrain award he or she has achieved.

Here is my data:

df <- structure(list(Name = c("Mark", "Joseph", "Lucas"), Achievement = c("Cyber Award", 
"Biology Award", "Co-author of 'New are of technology safety'"
)), class = "data.frame", row.names = c(NA, -3L))
    Name                                 Achievement
1   Mark                                 Cyber Award
2 Joseph                               Biology Award
3  Lucas Co-author of 'New are of technology safety'

Now I want to add a third column which indicates if the achievement has anything to do with strings in a vector:

my_vector <- c("cyber", "Cyber", "technology", "Technology", "computer", "Computer")

(so three conditions with capital and normal letter).

Desired output:

    Name                                 Achievement Cyber Achievement
1   Mark                                 Cyber Award                 1
2 Joseph                               Biology Award                 0
3  Lucas Co-author of 'New are of technology safety'                 1

I have no clue where to start, hope anyone can help me.


Solution

  • First create a pattern using paste with the collapse argument.

    Then look with str_detect if any of these pattern strings are in the column string (Achievment).

    If so 1 else 0:

    library(dplyr)
    library(stringr)
    
    pattern <- paste(c("cyber", "Cyber", "technology", "Technology", "computer", "Computer"), collapse = "|")
    
    
    df %>% 
      mutate(`Cyber Achievement` = ifelse(str_detect(Achievement, pattern), 1, 0))
    

    OR base R using grepl:

    df$Cyber_Achievemnt <- ifelse(grepl(pattern, df$Achievement), 1, 0)
    
        Name                                 Achievement Cyber Achievement
    1   Mark                                 Cyber Award                 1
    2 Joseph                               Biology Award                 0
    3  Lucas Co-author of 'New are of technology safety'                 1
    

    data:

    structure(list(Name = c("Mark", "Joseph", "Lucas"), Achievement = c("Cyber Award", 
    "Biology Award", "Co-author of 'New are of technology safety'"
    )), class = "data.frame", row.names = c(NA, -3L))