Search code examples
rcategoriescategorization

creating new variable category from existing variables in r


Here is data:

var1 <- c("A", "B", "K", "L", "G", "M", "M")
var2  <- c("B", "A", "K", "L", "H", "M", "M")
mydata <- data.frame (var1, var2)
     var1 var2
1    A    B
2    B    A
3    K    K
4    L    L
5    G    H
6    M    M
7    M    M

I want to create new category variable, if value of any row are equal, that will be in same category. Thus row by row comparision (all possible need to be done).

for example mydata[1,] and mydata[2,] are equal so that they will have same value say 1, in the new variable category. One important point here to make in what I intend to do. The order of the var1, var2 can be any, means that [A, B] is same as [B, A] for [var1, var2]

Sorry for simple question I could not solve to.

Edits: Expected output

 var1 var2   caterory
1    A    B   1 
2    B    A   1
3    K    K   2
4    L    L   3
5    G    H   4
6    M    M   5
7    M    M   6

Solution

  • mydata$var3<-as.factor(apply(mydata,1,function(x){paste(x[order(x)],collapse='')}))
    
    > mydata
      var1 var2 var3
    1    A    B   AB
    2    B    A   AB
    3    K    K   KK
    4    L    L   LL
    5    G    H   GH
    6    M    M   MM
    7    M    M   MM
    
    > str(mydata)
    'data.frame':   7 obs. of  3 variables:
     $ var1: Factor w/ 6 levels "A","B","G","K",..: 1 2 4 5 3 6 6
     $ var2: Factor w/ 6 levels "A","B","H","K",..: 2 1 4 5 3 6 6
     $ var3: Factor w/ 5 levels "AB","GH","KK",..: 1 1 3 4 2 5 5