Search code examples
rif-statementdplyrmultiple-conditions

How to create a new variable based on the values in two columns


I want to add a new column to a dataframe based on the condition of two columns.

I have the following data:

Animal.1 <- c("A", "B", "C", "B", "A" )
Animal.2 <- c("B", "A", "A", "C", "C")
df <- data.frame(Animal.1, Animal.2)

If the following conditions are met:

Animal.1 = A and Animal.2 = B OR Animal.1 = B and Animal.2 = A

I would like the new column called pair.code to equal 1.

I would like a different number for every pair of animal ids, but the same number to be used if the same animal id's are found in either Animal.1 and Animal.2 OR Animal.2 and Animal.1.

The final data should look like this:

Animal.1 <- c("A", "B", "C", "B", "A" )
Animal.2 <- c("B", "A", "A", "C", "C")
pair.code <- c("1", "1", "2", "3", "2")


df <- data.frame(Animal.1, Animal.2)

Solution

  • We can first sort the elements by row and then create the 'pair.code' with match

    m1 <- t(apply(df, 1, sort))
    v1 <- paste(m1[,1], m1[,2])
    df$pair.code <- match(v1, unique(v1))
    df$pair.code
    #[1] 1 1 2 3 2