Search code examples
listalgorithmprologsublist

Most common sublist in Prolog


The problem is as follows: Write a predicate in Prolog most_common_sublist(L1,N,L2) that will find the sublist L2 with length N such that it is the most common sublist in L1.

//Example 1:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],1,L).
L=[2];

//Example 2:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],2,L).
L=[2,2];

//Example 3:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],3,L).
L=[2,2,3];

My approach was to generate all the possible sublists of size N using the generator predicate, check which of those is the most common one in the list using the check predicate, and then just put that as my result. The reason why I'm not using the built-in predicates for length and add is because I'm supposed to write my own. My generator predicate works, it gives out the correct output.

?- generator([1,2,2,3,2,2,4,2,2,3],3,L).
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2|...], [2|...]] [write]
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2, 2], [2, 2, 3]] 

I checked all my predicates and they all seem to work (at least for the test cases I'm using), the problem occurs with the check predicate. It seems to work fine until it gets to N>=P (when this is NOT true, works fine when it is true). I expect the program to go onto the next check predicate under it (the third check predicate) so that it stores Temp value in Result instead of the H value. For some reason it does not go to the third check predicate (I checked with debugger), instead it does something weird (I can't figure out what).

most_common_sublist(L,N,Result):-generator(L,N,LOP),check(LOP,_,Temp),add(Temp,[],Result).

add([],L,L).
add([X|L1],L2,[X|L3]):-add(L1,L2,L3).

length([],0).
length([X|O],N):-length(O,M),N is M+1.

sublist([H|_],1,[H]).
sublist([H|T],N,[H|LOP]):-M is N-1,sublist(T,M,LOP).

generator(L,N,[L]):-length(L,M),N=:=M.
generator([H|T],N,LOP):-sublist([H|T],N,PN),generator(T,N,LP),add([PN],LP,LOP).

check([],Z,K):-Z is 0,add([],[],K).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,_),N>=P,Hits is N,add(H,[],Result).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,Temp),Hits is P,add(Temp,[],Result).

check_how_many(X,[X],1).
check_how_many(_,[_],0).
check_how_many(Pattern,[H|T],Hits):-same(Pattern,H),check_how_many(Pattern,T,P),Hits is P+1.
check_how_many(Pattern,[_|T],Hits):-check_how_many(Pattern,T,P),Hits is P.

same([], []).
same([H1|R1], [H2|R2]):-
    H1 = H2,
    same(R1, R2).

Solution

  • Since I'm not familiar with your code I rewrote it with similar functionality. Lines followed by %here are my improvements (2 times used). For simplicity I used the inbuild predicates length/2 and append/3 instead of add/3. sublist/3 has a complete different code but same functionality, same/2 is not necessary at all. Most uses of you add/3 were not necessary as well as some equality statements.

    most_common_sublist(L,N,Temp):-
        generator(L,N,LOP),
        check(LOP,_,Temp).
    
    sublist(L,N,S):-
        length(S,N),
        append(S,_,L).
    
    generator(L,N,[L]):-
        length(L,N).
    generator([H|T],N,LOP):-
        sublist([H|T],N,PN),
        generator(T,N,LP),
        append([PN],LP,LOP).
    
    check([],0,[]).
    check([H|T],N,H):-
        check_how_many(H,[H|T],N),
        check(T,P,_),
        N>=P.
    check([H|T],P,Temp):-
        check_how_many(H,[H|T],N),
        check(T,P,Temp)
    %here
        , N=<P
        .
    
    check_how_many(X,[X],1).
    check_how_many(_,[_],0).
    check_how_many(H,[H|T],Hits):-
        check_how_many(H,T,P),
        Hits is P+1.
    check_how_many(Pattern,[H|T],P):-
    %here
        Pattern \== H,
        check_how_many(Pattern,T,P).
    

    After giving up on tracing I just used the following call to debug after enabling long output (
    ?- set_prolog_flag(answer_write_options,[max_depth(100)]). ):

    ?- findall(Temp,check([[1, 2, 2], [2, 2, 1]],_,Temp),Out).
    

    Initial output was

    Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[],[],[2,2,1],[2,2,1],[],[]].
    

    Which contains way to much empty lists. First fix (%here) was to set the condition N=<P for the last check/3 case. Until now it was possible to choose a P lower than N, which should be covered by the 2nd check/3 case. Output changed to

    Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[2,2,1],[]].
    

    Better, but still empty lists possible. A similar case happens in the last check_how_many/3 case: you have to state that H and Pattern are different, otherwise it would be possible for a fitting Pattern not to be counted. Lets check the output

    Out = [[1,2,2],[1,2,2],[1,2,2],[2,2,1]].
    

    Way better. Lets check another case:

    ?- findall(Temp,check([[1, 2, 2], [1, 2, 2], [2, 2, 1]],_,Temp),Out).
    Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
    
    ?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
    Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,2],[2,2,2],[2,2,2],[1,2,2]].
    

    Works... Almost.

    So the problem seems to be check_how_many/3: alter

    check_how_many(_,[_],0).
    

    to

    check_how_many(_,[],0).
    

    and you should be fine.

    ?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
    Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
    

    Since it is way more fun to write the code yourself than to debug foreign code I'll add another answer with my attempt.