Search code examples
sqlgroup-bycountgroup-concatsql-server-group-concat

In SQL, create pairs based on a grouping and count their frequency


What I want is to create DISTINCT pairs of column 2 for each ID and arrange them by count.
Let's use this table as an example:

CREATE TABLE mytable
    (`ID` int, `C2` varchar(1), `C3` varchar(2))
;
    
INSERT INTO mytable
    (`ID`, `C2`, `C3`)
VALUES
    (1, 'A',' a1'),
    (1, 'B', 'b1'),
    (2, 'A', 'a2'),
    (3, 'A', 'a3'),
    (3, 'C', 'c3'),
    (3, 'A', 'a4'),
    (4, 'A', 'a1'),
    (4, 'B', 'b4'),
    (4, 'A', 'a2'),
    (4, 'D', 'd1');

For 1, pair would be A-B.
For 2, one wouldn't exist.
For 3, pair would be A-C.
For 4, pairs would be A-B, A-D, and B-D.

So the output would be:

| Pair | Cnt |
| A-B  | 2   |
| A-C  | 1   |
| A-D  | 1   | 
| B-D  | 1   |

Is this something we can do in SQL using something like GROUP_CONCAT?
I've been wrapping my head around this problem for days and still can't think of a simple solution.

Thanks!


Solution

  • I think this is a self-join and count distinct. One method is:

    select t1.c2, t2.c2, count(distinct t1.id) as cnt
    from t t1 join 
         t t2
         on t1.id = t2.id and t1.c2 < t2.c2
    group by t1.c2, t2.c2
    order by cnt desc;
    

    Depending on your data, it might be more efficient to remove duplicates first and then join:

    with tt as (
          select distinct t.id, t.c2
          from t
         )
    select t1.c2, t2.c2, count(t1.id) as cnt
    from tt t1 join 
         tt t2
         on t1.id = t2.id and t1.c2 < t2.c2
    group by t1.c2, t2.c2
    order by cnt desc;