Search code examples
pythonorange

How to avoid same variable being used twice in an Orange CN2 rule?


I am using Orange CN2 for rule induction. Sometimes, a variable is used twice in a rule. Here is an example rule: "IF score > 40 and amount < 100 and score > 55 THEN status = bad". Is there a way to configure CN2 so that a variable can only be used once in a rule? Additionally, is it possible to configure CN2 to only allow ">" condition (i.e., disallow "<") for continuous variables?


Solution

  • I don't think you can prevent CN2 using the same attribute more than once. In some cases you actually do need two conditions, e.g. score > 40 and score < 50.

    In your case, however, the first condition (score>40) is unnecessary. I would suggest you to write a postpruning procedure that runs through conditions of rule (rule.filter.conditions), tries to remove each condition and see whether the new rule covers the same examples as before.

    For the second question, there is no simple way to disallow specific conditions, like "<" for all continuous value. Probably best would be to implement a new validator class (learner.rule_finder.validator) that would reject rules with wrong conditions. Something like that:

    class ConditionsValidator(Orange.core.RuleValidator):
        """ prunes rules with 'isgreater' conditions """
        def __call__(self, rule, data, weight_id, target_class, prior):
            for c in rule.filter.conditions:
                if c.oper == Orange.data.filter.ValueFilter.Greater:
                    return False
            return True
    

    Then, set an object of this validator as the new validator for rule learner:

    learner.rule_finder.validator = ConditionsValidator()