Search code examples
google-sheetspivot-tablecrosstab

Is there a way to use same field as rows and columns in google sheets to count unique occurrence between columns?


Looking to convert

Task id John Jan Juliet
1 1 1 0
2 1 0 1
3 0 1 1
4 0 0 1
5 0 1 1
6 1 1 0
7 0 1 0
8 1 0 0
9 0 1 1
10 1 1 0

To

John Jan Juliet
John 3 1
Jan 3 3
Juliet 1 3

Solution

  • I have set up a new sheet ("Erik Help") in your sample spreadsheet.

    In B1:

    =SORT(FILTER(Sheet1!B1:1,Sheet1!B1:1<>""))

    This simply fills the top row with your names list, sorted alphabetically.

    In A2:

    =TRANSPOSE(SORT(FILTER(Sheet1!B1:1,Sheet1!B1:1<>"")))

    This fills A2 down with the same names list as above, just vertically.

    In B2 is the main formula for the grid (which is then dragged over and down):

    =ArrayFormula(IF( ($A2="") + (B$1="") + ($A2=B$1),, SUM(MMULT(IF((FILTER(Sheet1!$B$2:$L,Sheet1!$A$2:$A<>"")=1) * (Sheet1!$B$1:$L$1=$A2),1,0), SEQUENCE(COLUMNS(Sheet1!$B$1:$L$1),1,1,0)) * MMULT(IF((FILTER(Sheet1!$B$2:$L,Sheet1!$A$2:$A<>"")=1) * (Sheet1!$B$1:$L$1=B$1),1,0), SEQUENCE(COLUMNS(Sheet1!$B$1:$L$1),1,1,0)))))

    The first ( ) + ( ) + ( ) tests three OR conditions. If any is true, the cell will be left blank. This is what allows the formula to be dragged all the way right and down without throwing errors and, in essence, "waiting" for new data from the first two formulas above that it can process.

    The rest of the formula is too complex to warrant full explanation (e.g., how MMULT works in detail), this being a volunteer-run site. (Writing the formula took more time than I generally spend in a day on this or other forums.) But here's the gist.

    Two grids — each formed by an MMULT (matrix multiplication) — are SUMmed. The first MMULT will produce a grid the same size as the Sheet1 grid, filled with 1 only if two conditions are met: that there was already a 1 in that slot and that the name above matches the name to the right in the "Erik Help" grid. Otherwise, the result for that slot is a zero. The second MMULT forms the same size grid based on the same conditions, only this time it gets a 1 only if there is already a 1 and the name above matches the name above the cell in "Erik Help." These two grids are multiplied, and if the product is a 1, we know that BOTH names had a 1 there. Once SUMmed, we get the count of shared projects for those two names.

    As this formula is dragged, cell references not locked with a dollar sign will adjust, so that two different names will be compared by the two MMULT grids.

    Because this solution requires comparing arrays with arrays with arrays, I don't currently see how a further array solution is possible, hence the need for the formulas to be dragged. That is, each of these formulas is already jam-packed with array processing.

    Again, the formula is currently dragged all the way to Column Z and down to Row 200. However, it only references up to Column L (which is as far as your current names list goes). If your real world application has more names and thus carries over past Column L, the easiest way to change all of the formulas at once is this:

    1. Go to the "Erik Help" sheet (which you can, of course, rename as you like).

    2. Hit Ctrl-H to open the Find/Replace dialog box.

    3. Enter $L in the FIND field and $? in the REPLACE field (where ? will be the new column to which you want the results to extend, e.g., $M or $P, etc.)

    4. Choose "This sheet" from the "Search" drop-down.

    5. Check the box next to "Also search within formulas."

    6. Click the "Replace all" button.

    If the data set shrinks or grows again, do the same steps, just changing the old furthest column reference for the new furthest column reference.