Search code examples
clipsexpert-systeminference-enginegeneralization

CLIPS - looking for match between random slots in multislot fields


Consider such situation. I have such templates:

(deftemplate MAIN::simplecause
   (multislot coraxinfo (type INTEGER) (default undefined))
   (multislot changeinfo (type SYMBOL) (default undefined)))

(deftemplate MAIN::finalcause   
   (multislot coraxinfo (type INTEGER) (default undefined))
   (multislot changeinfo (type SYMBOL) (default undefined)))

I know that in coraxinfo slot I will always have not more than 14 values (maybe less, but never more). I also now that in changeinfo multislot I will have not more than 13 values.

I am trying to write a rule which will find all possible matches between any facts I will have.

For instance:

(deffacts start
       (simplecause (coraxinfo 1 2 3) (changeinfo a b c))
       (simplecause (coraxinfo 7 8 2 3 9) (changeinfo d a b e))
       (simplecause (coraxinfo 2 3 10 13) (changeinfo f g a b z))
       (simplecause (coraxinfo 77 88 99 66) (changeinfo k m l s))
       (simplecause (coraxinfo 88 99 11 22) (changeinfo v k m w))
       (simplecause (coraxinfo 13 88 99) (changeinfo k m))
       (simplecause (coraxinfo 666 777 888) (changeinfo abc def))
       (simplecause (coraxinfo 666 111 222 888 333 444 555 777 999) (changeinfo abc 1a 2a 3a def 4a)))

I would need to get this (order of values in each multislot doesn't matter):

(finalcause (coraxinfo 2 3) (changeinfo a b))
(finalcause 88 99) (changeinfo k m)) 
(finalcause 666 777 888) (changeinfo abc def)) 

For now I have stopped on this function:

(defrule cause_generalization_initial1
   ?f1 <- (simplecause (coraxinfo $?coraxmatch1 $?coraxmatch2 $?coraxmatch3 $?coraxmatch4 $?coraxmatch5 $?coraxmatch6 $?coraxmatch7) (changeinfo $?pricematch1 $?pricematch2 $?pricematch3 $?pricematch4 $?pricematch5 $?pricematch6 $?pricematch7))
   ?f2 <- (simplecause (coraxinfo $?coraxmatch1 $?coraxmatch2 $?coraxmatch3 $?coraxmatch4 $?coraxmatch5 $?coraxmatch6 $?coraxmatch7) (changeinfo $?pricematch1 $?pricematch2 $?pricematch3 $?pricematch4 $?pricematch5 $?pricematch6 $?pricematch7))
   (test (neq ?f1 ?f2))
   (not (finalcause (coraxinfo $?coraxmatch1 $?coraxmatch2 $?coraxmatch3 $?coraxmatch4 $?coraxmatch5 $?coraxmatch6 $?coraxmatch7) (changeinfo $?pricematch1 $?pricematch2 $?pricematch3 $?pricematch4 $?pricematch5 $?pricematch6 $?pricematch7)))
   =>
   (assert (finalcause (coraxinfo $?coraxmatch1 $?coraxmatch2 $?coraxmatch3 $?coraxmatch4 $?coraxmatch5 $?coraxmatch6 $?coraxmatch7) (changeinfo $?pricematch1 $?pricematch2 $?pricematch3 $?pricematch4 $?pricematch5 $?pricematch6 $?pricematch7))))

It is a little bit clumsy, but as far as I remember $? means 'zero or more', so even if I will have less fields than I have specified in search patter it should work. I use up to 7 multislots in each of patterns since having 14 or 13 values as a maximum means that in worst case every second value in multislot will match to smth in other fact.

The issue that when I load the facts specified in deffacts CLIPS goes to a kind of infinite loop - it doesn't responds for a long time so I beleive I have made a mistake in my rule. Also this rule should kill the engine in case I will have couple of facts which are almost same with the difference only in one field. In such case it will produce terrible amount of matches between them. Any idea where I was wrong? I will really appreciate any suggestions.

UPDATE. If we are trying to take the approach of constructin (finalcause) facts by adding one value to coraxinfo and changeinfo slots at a time than I have currently stopped on these 2 rules:

Creates initial finalcause fact with one matching value in both multislots:

(defrule cause_generalization_initial
    ?f1 <- (simplecause (coraxinfo $? ?coraxmatch $?) (changeinfo $? ?changematch $?))
    ?f2 <- (simplecause (coraxinfo $? ?coraxmatch $?) (changeinfo $? ?changematch $?))
    (test (neq ?f1 ?f2))
    (not (finalcause (coraxinfo ?coraxmatch) (changeinfo ?changematch)))
    =>
    (assert (finalcause (coraxinfo ?coraxmatch) (changeinfo ?changematch)))

If we have any finalcause fact we try to check that all the multislot values in it are subset of everything before the ?coraxmatchafter value in both matching simplecause facts and assert an extended finalcause. I beleive this rule should be able to 'jump over the gaps' in matching simplecauses.

(defrule cause_generalization_advanced
?f1 <- (simplecause (coraxinfo $?coraxbefore1 ?coraxmatchafter $?) (changeinfo $?changebefore1 ?changematchafter $?))
?f2 <- (simplecause (coraxinfo $?coraxbefore2 ?coraxmatchafter $?) (changeinfo $?changebefore2 ?changematchafter $?))
(test (neq ?f1 ?f2))
(finalcause (coraxinfo $?finalcoraxbefore) (changeinfo $?finalchangebefore))
(test (and  (subsetp $?finalcoraxbefore $?coraxbefore1) (subsetp $?finalcoraxbefore $?coraxbefore2) 
        (subsetp $?finalchangebefore $?changebefore1) (subsetp $?finalchangebefore $?changebefore2)))
=>
(assert (finalcause (coraxinfo $?finalcoraxbefore ?coraxmatchafter) (changeinfo $?finalchangebefore ?changematchafter))))

I use the rules with these deffacts (notice that deffacts is different from the one above):

(deffacts start
       (simplecause (coraxinfo 1 2 3) (changeinfo a b c))
       (simplecause (coraxinfo 7 8 2 3 9) (changeinfo d a b e))
       (simplecause (coraxinfo 2 3 10 13) (changeinfo f g a b z))
       (simplecause (coraxinfo 77 88 99 66) (changeinfo k m l s))
       (simplecause (coraxinfo 88 99 11 22) (changeinfo v k m w))
       (simplecause (coraxinfo 13 88 99) (changeinfo k m))
       (simplecause (coraxinfo 666 777 888) (changeinfo abc def))
       (simplecause (coraxinfo 666 111 222 777 333 444 555 888 999) (changeinfo abc 1a 2a 3a def 4a)))

The issue here is that I expected that it will be able to produce the finalcause for 3 matching fields but it produces only finalcause facts with 2 matching fields and I don't get why. Shouldn't it notice that these 3 facts are falling into the second rule?

(simplecause (coraxinfo 666 777 888) (changeinfo abc def))
(simplecause (coraxinfo 666 111 222 777 333 444 555 888 999) (changeinfo abc 1a 2a 3a def 4a))
(finalcause (coraxinfo 666 888) (changeinfo abc def))

Output of both rules is:
f-0     (initial-fact)
f-1     (simplecause (coraxinfo 1 2 3) (changeinfo a b c))
f-2     (simplecause (coraxinfo 7 8 2 3 9) (changeinfo d a b e))
f-3     (simplecause (coraxinfo 2 3 10 13) (changeinfo f g a b z))
f-4     (simplecause (coraxinfo 77 88 99 66) (changeinfo k m l s))
f-5     (simplecause (coraxinfo 88 99 11 22) (changeinfo v k m w))
f-6     (simplecause (coraxinfo 13 88 99) (changeinfo k m))
f-7     (simplecause (coraxinfo 666 777 888) (changeinfo abc def))
f-8     (simplecause (coraxinfo 666 111 222 777 333 444 555 888 999) (changeinfo abc 1a 2a 3a def 4a))
f-9     (finalcause (coraxinfo 666) (changeinfo abc))
f-10    (finalcause (coraxinfo 666 888) (changeinfo abc def))
f-11    (finalcause (coraxinfo 666 777) (changeinfo abc def))
f-12    (finalcause (coraxinfo 666) (changeinfo def))
f-13    (finalcause (coraxinfo 777) (changeinfo abc))
f-14    (finalcause (coraxinfo 777 888) (changeinfo abc def))
f-15    (finalcause (coraxinfo 777) (changeinfo def))
f-16    (finalcause (coraxinfo 888) (changeinfo abc))
f-17    (finalcause (coraxinfo 888) (changeinfo def))
f-18    (finalcause (coraxinfo 88) (changeinfo k))
f-19    (finalcause (coraxinfo 88 99) (changeinfo k m))
f-20    (finalcause (coraxinfo 88) (changeinfo m))
f-21    (finalcause (coraxinfo 99) (changeinfo k))
f-22    (finalcause (coraxinfo 99) (changeinfo m))
f-23    (finalcause (coraxinfo 2) (changeinfo a))
f-24    (finalcause (coraxinfo 2 3) (changeinfo a b))
f-25    (finalcause (coraxinfo 2) (changeinfo b))
f-26    (finalcause (coraxinfo 3) (changeinfo a))
f-27    (finalcause (coraxinfo 3) (changeinfo b))

Solution

  • Using 7 multifield variables, you're creating a combinatoric explosion in the number of ways the pattern can be matched. Look at the number of ways in which just the first pattern in your rule can be matched:

    CLIPS> 
    (deftemplate simplecause
       (multislot coraxinfo)
       (multislot changeinfo))
    CLIPS> 
    (deftemplate  finalcause   
       (multislot coraxinfo)
       (multislot changeinfo))
    CLIPS>    
    (defrule cause_generalization_initial1
       (simplecause (coraxinfo $?coraxmatch1 $?coraxmatch2 $?coraxmatch3 $?coraxmatch4 $?coraxmatch5 $?coraxmatch6 $?coraxmatch7) 
                    (changeinfo $?pricematch1 $?pricematch2 $?pricematch3 $?pricematch4 $?pricematch5 $?pricematch6 $?pricematch7))
       =>)
    CLIPS> (assert (simplecause (coraxinfo) (changeinfo)))
    <Fact-1>
    CLIPS> (agenda)
    0      cause_generalization_initial1: f-1
    For a total of 1 activation.
    CLIPS> (modify 1 (coraxinfo a) (changeinfo 1))
    <Fact-2>
    CLIPS> (agenda)
    0      cause_generalization_initial1: f-2
       .
       .
       .
    0      cause_generalization_initial1: f-2
    For a total of 49 activations.
    CLIPS> (modify 2 (coraxinfo a b) (changeinfo 1 2))
    <Fact-3>
    CLIPS> (agenda)
    0      cause_generalization_initial1: f-3
       .
       .
       .
    0      cause_generalization_initial1: f-3
    For a total of 784 activations.
    CLIPS> (modify 3 (coraxinfo a b c) (changeinfo 1 2 3))
    <Fact-4>
    CLIPS> (agenda)
    0      cause_generalization_initial1: f-4
       .
       .
       .
    0      cause_generalization_initial1: f-4
    For a total of 7056 activations.
    
    
    CLIPS> (modify 4 (coraxinfo a b c d) (changeinfo 1 2 3 4))
    <Fact-5>
    CLIPS> (agenda)
    0      cause_generalization_initial1: f-5
       .
       .
       .
    0      cause_generalization_initial1: f-5
    For a total of 44100 activations.
    CLIPS>
    

    If the coraxinfo and changeinfo slots are empty there's only way the pattern can be matched and only one activation. If each slot contains a single value there's 7 different ways that each slot can be matched (a single value in each of the seven variables with the remaining variables empty). Between the two slots that means there's 49 different ways the pattern can be matched.

    Once you get to 4 values in each slot, there's 44100 different ways that single pattern can be matched. That means by adding a second pattern there's 44,100 * 44,100 combinations that need to be compared. That's 1,944,810,000 comparisons that need to be made for the assertion of a single fact and you've got 8 facts including one with 9 values in one slot and 6 in the other.

    This is not a problem you're going to solve with a single rule. Probably the best approach is to construct the finalcause facts one element at a time using multiple rules. For example, first determine that there are two facts with 666 in them and create a (finalcause (coraxinfo 666) (changeinfo)) fact. Then have a rule which determines that there are two facts both having all the values present in finalcause plus an additional value that is not present and add that value. For example, (finalcause (coraxinfo 666 777) (changeinfo)). You can then have rules that remove the intermediate results.

    You'll also want to construct the rules so that you don't generate permutations. For example, you don't want to generate all of these facts which are different but equivalent:

    (finalcause (coraxinfo 666 777 888) (changeinfo abc def))
    (finalcause (coraxinfo 666 888 777) (changeinfo abc def))
    (finalcause (coraxinfo 777 666 888) (changeinfo abc def))
    (finalcause (coraxinfo 777 888 666) (changeinfo abc def))
    (finalcause (coraxinfo 888 666 777) (changeinfo abc def))
    (finalcause (coraxinfo 888 777 666) (changeinfo abc def))
    (finalcause (coraxinfo 666 777 888) (changeinfo def abc))
    (finalcause (coraxinfo 666 888 777) (changeinfo def abc))
    (finalcause (coraxinfo 777 666 888) (changeinfo def abc))
    (finalcause (coraxinfo 777 888 666) (changeinfo def abc))
    (finalcause (coraxinfo 888 666 777) (changeinfo def abc))
    (finalcause (coraxinfo 888 777 666) (changeinfo def abc))
    

    To do this, I'd suggest sorting the values that you place in finalcause slots so there's a unique ordering for the equivalent facts.