Search code examples
rdataframemergeunique

How to identify unique IDs and the overlap of two datasets in R


I am working with two datasets (dataset1 and dataset2) that both consist of plenty customer emails. I would like to match identify which emails are unique in each dataset and which are "overlapping" (which are observed in both datasets). I would like to end up with 3 datasets:

  • one with emails unique to dataset1
  • one with emails unique to dataset2
  • one with emails that are observed in both dataset1 and dataset2 (overlap)

Here's an example for reproducability:

dataset1 <- data.frame(email = c("A", "B", "C", "D", "E" ))
dataset2 <- data.frame(email = c("X", "Y", "Z", "D", "E" ))

The result should be:

  • result1 consists of email "A", "B", "C"
  • result2 consists of email "X", "Y", "Z"
  • result3 consists of email "D", "E"

Thank you!


Solution

  • You can use %in% :

    result1 <- subset(dataset1, !email %in% dataset2$email)
    result1
    
    #  email
    #1     A
    #2     B
    #3     C
    
    result2 <- subset(dataset2, !email %in% dataset1$email)
    result2
    
    #  email
    #1     X
    #2     Y
    #3     Z
    
    result3 <- subset(dataset1, email %in% dataset2$email)
    result3
    
    #  email
    #4     D
    #5     E