Search code examples
excelgroup-bynormalization

How to aggregate data 'grouped by' its parent and use that for further calculation (normalization) in Excel?


I have some data in an Excel Workbook. This data is hierarchical i.e., there is a parent-child relationship between the data across work-sheets. Here's how it looks in a particular worksheet: (There are other rows (above it) and columns to the right, but aren't important for this problem)

| Parent | Item | Score |
|   P1   |  I1  |   3   |
|   P2   |  I2  |   1   |
|   P1   |  I3  |   6   |
|   P3   |  I4  |  1.5  |
|   P4   |  I5  |   4   |

We need to have a sum total of all Items belonging to a particular parent i.e., total by parent to get 'sum of items' for each parent. (The root worksheet won't have any parents (i.e., blank column), but the structure is the same across worksheets). The need is to 'normalize' the scores of the children of a parent on a scale of 0-1 (EDIT: i.e. sum of the scores of the children must sum to 1)

I've been playing around with pivot tables and I see that you can aggregate the data by parent. But I'm not sure how exactly can I use that data to normalize the item scores. More so, the data across the excel sheets is quite dynamic and from my minimal experience with pivot tables it seems data isn't being refreshed automatically.

More so, each 'child-level worksheet' is generated from the current level worksheet (using macros). So we need a way to be able to aggregate scores by parent so that we can easily have it propagated to the next worksheet when copied (even if it's to be done manually).

I'm just at a corner with being able to do a 'Group By' (from SQL) in Excel. Any ideas?


Solution

  • If you need to find the sum of all values matching a given criterion, use the SUMIF function. I'm assuming "Parent" is in column A, "Item" in column B ans "Score" in column C. In D2, you would have to put the following formula: =SUMIF(A:A;A2;C:C), and copy it down.

    However, you don't need to know the sum of the scores of the children of a parent if you want to put the scores on a scale of 0 to 1: you only need to know the maximum score of the children of a parent. Because the MAXIF function doesn't exist, we will use an array function combining MAX and IF. Type this in D2 and press Ctrl + Shift + Enter: =MAX(IF(A:A=A2;C:C). The brackets should have been added to show that it's an array formula: {=MAX(IF(A:A=A2;C:C)}. Now you need to divide the score of the child by the maximum score of its group.