Search code examples
rdataframeif-statementconditional-statements

Create a new variable based on values in two columns and list of values in a vector


I have a dataframe with columns "V1" and "V2".

Z<- c('931', '907','905','902','8552','855','8542','854','8532','853','852','851','850')

I want to add a new variable "Match" to the dataframe which takes the values 1, 2, or 3 if the following conditions satisfies:

Match=1, if value in V1 and V2 are same

Match=2, if value in both V1 and V2 contain any of the values in vector Z

Match=3, if value in V1 or V2 contain any values other than the values in vector Z

The resulting dataframe should have the values as given in column Match.

V1      V2      Match
8552    689     3
576     8552    3
8552    907     2
8552    85      3
8552    902     2
8552    783     3
931     367     3
8552    1090    3
8552    905     2
8552    8552    1
8552    1004    3
113     907     3
8552    1001    3
8542    564     3
850     720     3

Solution

  • you can use a case_when statement from the {dplyr} package. like so:

    df %>% mutate(Match = case_when(V1 == V2 ~ 1,
                                    V1 %in% Z & V2 %in% Z ~ 2,
                                    !(V1 %in% Z) | !(V2 %in% Z) ~ 3)