Search code examples
rmergedata.tablecartesian-productcross-join

R: cross join column on itself excluding symmetric results


I have a data table init that looks like this:

> init
+---+
| id|       
+---+
|  a|
+---+
|  b|
+---+
|  c|
+---+

I want to obtain all pairs for id column, so I need to cross join the init data table on itself. Additionally, I want to exclude equal and symmetric results (in my case a,b == b,a, etc.).

Desired output is:

+---+---+
|id1|id2|
+---+---+
|  a|  b|
|  a|  c|
|  b|  c|
+---+---+

How can this be done with the data.table approach?

Full cross join can be implemented as full_cj <- CJ(init$id, init$id):

+---+---+
| V1| V2|
+---+---+
|  a|  a|
|  a|  b|
|  a|  c|
|  b|  a|
|  b|  b|
|  b|  c|
|  c|  a|
|  c|  b|
|  c|  c|
+---+---+

But how can I remove symmetrical and identical results from the output?
My real data is huge, so I'm looking for an efficient solution.


Solution

  • You can use a non equi join after converting to factor

    dt[, id2:=as.factor(id)][dt, on=.(id2>id2),nomatch=0,.(id, id2)]