Search code examples
google-sheetsgoogle-sheets-formulacombinationsfrequency

Calculating the most frequent pairs in a dataset


Is it possible to calculate most frequent pairs from a combinations of pairs in a dataset with five columns? I can do this with a macro in excel, I'd be curious to see if there's a simple solution for this in google sheets.

I have a sample data and results page here :

Data:

B1  B2  B3  B4  B5
6   22  28  32  36
7   10  17  31  35
8   33  38  40  42
10  17  36  40  41
8   10  17  36  54
9   30  32  51  55
1   4   16  26  35
12  28  30  40  43
42  45  47  49  52
10  17  30  31  47
10  17  33  51  58
4   10  17  30  32
2   35  36  37  43
6   10  17  38  55
3   10  17  25  32

Results would be like:

Value1  Value2  Frequency
10  17  8
10  31  2
17  31  2
10  36  2
17  36  2
30  32  2
10  30  2
17  30  2
10  32  2
17  32  2

etc

Each row represents a data set. The pairs don't have to be adjoining. There can be numbers between them.


Solution

  • Create a combination of pairs for each row using the method mentioned here. Then REDUCE all the pairs to create a virtual 2D array. Then use QUERY to group and find the count:

    =QUERY(
      REDUCE(
        {"",""},
        A2:A16,
        LAMBDA(acc,cur, 
          {
            acc;
            QUERY(
              LAMBDA(mrg,
                REDUCE(
                  {"",""},
                  SEQUENCE(COLUMNS(mrg)-1,1,0),
                  LAMBDA(a_,c_,
                    {
                      a_;
                      LAMBDA(rg,
                        REDUCE(
                          {"",""},
                          OFFSET(rg,0,1,1,COLUMNS(rg)-1),
                          LAMBDA(a,c,{a;{INDEX(rg,1),c}})
                        )
                      )(OFFSET(mrg,0,c_,1,COLUMNS(mrg)-c_))
                    }
                  )
                )
              )(OFFSET(cur,0,0,1,5)),
            "where Col1 is not null",0
            )
          }
        )
      ),
      "Select Col1,Col2, count(Col1) group by Col1,Col2  order by count(Col1) desc "
    )
    

    Input:

    B1(A1) B2 B3 B4 B5
    6 22 28 32 36
    7 10 17 31 35
    8 33 38 40 42
    10 17 36 40 41
    8 10 17 36 54
    9 30 32 51 55
    1 4 16 26 35
    12 28 30 40 43
    42 45 47 49 52
    10 17 30 31 47
    10 17 33 51 58
    4 10 17 30 32
    2 35 36 37 43
    6 10 17 38 55
    3 10 17 25 32

    Output(partial):

    count
    10 17 8
    10 30 2
    10 31 2
    10 32 2
    10 36 2
    17 30 2
    17 31 2
    17 32 2
    17 36 2
    30 32 2
    1 4 1
    1 16 1
    1 26 1
    1 35 1
    2 35 1
    2 36 1
    2 37 1
    2 43 1
    3 10 1
    3 17 1
    3 25 1