I wonder why nobody has asked this, but how do I classify (ordinal) entries in a table according to a prioritized ruleset / tree? (possibly with naked excel and not a nested if cascade)
Minimal example (only 3 of 11 or more features shown)
Name | IsCool | IsNerdy | HasChild
Joe | 1 | 1 | 1
Charliese | 1 | 0 | 1
Peter | 1 | 0 | 0
Jonas | 0 | 0 | 0
Rules
Priority | IsCool | IsNerdy | HasChild | => Group
1. | 1 | 1 | ignore | A (at least cool&nerdy)
2. | ignore | ignore | 1 | B (not A, but has a child)
3. | 1 | 0 | 0 | C (only cool)
4. | ignore | ignore | ignore | D (everything else)
stop after first match
yielding:
Name | IsCool | IsNerdy | HasChild | Group
Joe | 1 | 1 | 1 | A
Charliese | 1 | 0 | 1 | B
Peter | 1 | 0 | 0 | C
Jonas | 0 | 0 | 0 | D
You can convert a "ruleset" into all possible combinations of the attributes (IsCool, IsNerdy, HasChild, etc) by treating "Ignore" as 0 (zero) or 1 (unity).
So, the first rule in the questions ruleset would be replaced by two rules.
IsCool ¦ IsNerdy ¦ HasChild¦ Group
1 ¦ 1 ¦ 0 ¦ A
1 ¦ 1 ¦ 1 ¦ A
Although there are only 8 possibilities across the three attributes, this approach can lead to more than 8 rules. For example in the question's ruleset, a person with the data tuple (IsCool, IsNerdy, HasChild) given by (1,1,1) would match to both Group A and Group B when the "ignore"'s in the rulset are expanded in this way. To eliminate the ambiguity this causes, it is also necessary to apply the priorities: the match to Group A has higher priority than that for B so the lookup table would include (1,1,1,A) as a row but exclude (1,1,1,B).
With a larger ruleset involving more attributes, the task of constructing the required VLOOKUP
table from the table of rules will not be without its problems, particularly if a non-manual approach is desired and just Excel is to be used rather Excel in conjunction with VBA.
An alternative approach not involving VBA which treats the ruleset as data is as follows.
Formalising the notation introduced above, a rule involving n attributes can be expressed as
(r[1],r[2],...,r[n],G)
where r[i] (i=1,...,n)
can take values of 0
, 1
or "ignore"
and G
represents the group (one of A, B, C or D in the question's example).
An instance of data can similarly be represented as
(d[1],d[2],...,d[n])
where d[i] (i=1,...,n)
takes values of 0
or 1
(but not "ignore"
)
The rule is matched if
r[i] = "ignore" OR "d[i] = r[i]" for each i from 1 to n
There is a pretty obvious way implementing this in Excel as
=AND(OR(r[1]="ignore",d[1]=r[1]),OR(r[2]="ignore",d[2]=r[2]),...,OR(r[n]="ignore",d[n]=r[n]))
where, of course, the relevant cell references are used in place of the d[i]
and r[i]
placeholders shown above and the appropriate number of OR
's are nested inside of the AND
to replace the ...
shorthand.
The pseudo-formula above has a value of either TRUE
or FALSE
with the former indicating the data instance matches to the rule and the latter that it does not.
However, this is not the end of the story as the rules still need to be applied in priority order.
So, extending the notation further, assume that the rules are listed in priority order (rule 1 has a higher priority than rule 2, which has a higher priority than rule 3, etc)
If cell C[k] holds the result of the applying the k'th rule to an instance of the data then modifying the above pseudo-formula so that C[k] now has the formula
=IF(OR(C[1],...,C[k-1]),FALSE,AND(...))
ensures that the k'th rule can be matched only if no earlier (and therefore higher priority) rule is matched. (Here the third part of the IF
is the AND
formula previously noted.)
The screengrab below shows the approach in action for the question's example.
The cells in blue are formulae. Those for TRUE/FALSE values in the second table implement the pseudo-formula discussed above. For example, cellF13
shows the result of applying rule 1 to the data instance (0,0,0) and has the following formula
=AND(OR($C$5="Ignore",$C$5=$C13),OR($D$5="Ignore",$D$5=$D13),OR($E$5="Ignore",$E$5=$E13))
NB: no IF needs to be wrapped around this formula because there is no rule with a higher priority than rule 1.
The formula for cell I13
shows the result of applying rule 4 to the same data instance and needs to take account of the higher priorities accorded to rules 1, 2 and 3. The formula in this cell is
=IF(OR($F13:H13),FALSE,AND(OR($C$8="Ignore",$C$8=$C13),OR($D$8="Ignore",$D$8=$D13),OR($E$8="Ignore",$E$8=$E13)))
The formulae in cells G13
and H13
are similar to that of I13
(left as an exercise).
By design, there can be at most one TRUE value in each row and, if the ruleset is sound, there should be exactly one such value. The formulae in the final column of the second table make this asssumption about the ruleset and simply pick out the relevant value from the final column of the first table corresponding to whichever rule shows as TRUE.
The formula in cell J13
is
=INDEX(F$5:F$8,SUMPRODUCT(1*(F13:I13),F$12:I$12))
The formulae in range F13:J13
were simply copied down the rows of the table to cells F14:J20