Search code examples
rdplyrset-difference

weird output of dplyr::SETDIFF in r


I am trying to use setdiff function in dplyr, on these two dataframes:

t1 <- data.frame(c(1,2,3),c(1,2,3))
names(t1) <- c("C1","C2")

t2 <- data.frame(c(1,2,3), c(0,1,2))
names(t2) <- c("C1","C2")

But, I keep getting this output which I don't expect:

> setdiff(t2,t1)
  C1 C2
1  1  0
2  2  1
3  3  2

where am I wrong?


Solution

  • The result is perfectly reasonable. First of all, t1 contains three observations: (1,1), (2,2) and (3,3). None of these is present in t2.

    Now, setdiff is the set of t2 without the set of t1. In this case, since none of the observations (rows) of t1 are present in t2, t_2 - t_1 = t2.