Search code examples
excelcountpivotunique

How does the SUMPRODUCT command works in this example?


The following code allows me to determine distinct values in a pivot table in Excel:

=SUMPRODUCT(($A$A:$A2=A2)*($B$2:$B2=B2))

See also: Simple Pivot Table to Count Unique Values

The code runs perfectly fine. However, can somebody help me understand how this code actually works?


Solution

  • You write: the following code allows me to determine distinct values in a pivot table in Excel

    No. That formula alone does not do that. Read on for the explanation of what does.

    There's a typo in the formula. It should be

    =SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
    

    See the difference?

    The formula starts in row 2 and is copied down. In each row, the $A$2 reference and the $B$2 reference will stay the same. The $ signs make them absolute references. The relative references $A2 and A2 will change their row numbers when copied down, so in row 3 the A2 will change to A3 and B2 will change to B3. In the next row it will be A4 and B4, and so on.

    You may want to create a sample scenario with data similar to that in the thread you link to. Then use the "Evaluate Formula" tool on the Formulas ribbon to see step by step what is calculated. The formula evaluates from the inside out. Let's assume the formula has been copied down to row 5 and we are now looking at

    =SUMPRODUCT(($A$2:$A5=A5)*($B$2:$B5=B5))
    

    ($A$2:$A5=A5) this bit compares all the cells from A2 to A5 with the value in A5. The result is an array of four values, either true or false. The next bit ($B$2:$B5=B5) also returns an array of true or false values.

    enter image description here

    These two arrays are multiplied and the result is an array of 1 or 0 values. Each array has the same number of values.

    The first value of the first array will be multiplied with the first value of the second array. (see the red arrows)

    The second value of the first array will be multiplied with the second value of the second array. (see the blue arrows)

    and so on.

    True * True will return 1, everything else will return 0. The result of the multiplication is:

    enter image description here

    The nature of the SumProduct function is to sum the result of the multiplications (the product), so that is what it does.

    This function alone does not do anything at all to establish distinct values in Excel. In the thread you link to, the Sumproduct is wrapped in an IF statement and THAT is where the distinct values are identified.

    =IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)

    In plain words: If the combination of the value in column A of the current row and column B of the current row has already appeared above, return a zero, otherwise, return a 1.

    This marks distinct values of the combined columns A and B.