Search code examples
rdataframesimilarity

How to calculate Jaccard similarity between two data frame with in R


I have two binary dataframes c(0,1), and I didn't find any method which calculates the Jaccard similarity coefficient between both dataframes. I have seen methods that do this calculation between the columns of a single data frame.
Lets say DF1

DF1 <- data.frame(a=c(0,0,1,0),
                  b=c(1,0,1,0),
                  c=c(1,1,1,1)) 

and DF2:

DF2 <- data.frame(a=c(0,0,0,0),
                  b=c(1,0,1,0),
                  c=c(1,0,1,1)) 

What I am looking is a single Jaccard similarity coefficient between the two data frame (not column by column)

Could you help me with this ?


Solution

  • You can use dist:

    dist(t(cbind(unlist(DF1), unlist(DF2))), "binary")
    # 0.2857143
    

    The distance would be 1 for DF2 <- as.data.frame(xor(DF1, 1) +0L) and 0 for DF2 <- DF1.