Search code examples
prolog

Remove adjacent duplicates in a list with Prolog


I found a similar post but it didn't work for me. So please don't try to redirect me to other links.

This is the result that I want:

removeadj([a,a,a,b,c,c,a,a],[a,b,c,a]).
>True.

I tried this code:

concatener([],R,R). 
concatener([X|Y],L,R):-concatener(Y,L,R1),R=[X|R1].

removeadj([],[]).
removeadj([X],[X]).
removeadj([X|Y],R):- Y=[X|L],removeadj(Y,R1),concatener([],R1,R).
removeadj([X|Y],R):- removeadj(Y,R1),concatener(X,R1,R).

When I try a list with one element duplicated many times, it works:

removeadj([a,a,a,a,a],[a]).
> True

But when I use different elements, it doesn't work:

removeadj([a,a,a,b],[a,b]). 
> False.

I do not see where the problem is, so I can't fix it. Please I need your help.


Solution

  • Relational names

    The first is to reconsider the name of your relation. Currently, it suggests that someone has to do something. removeadj that's a command. Quite an adequate name in a programming language where commands are the ruling metaphor. But not in Prolog.

    In Prolog, we have relations. A name that reflects this relationness is often very helpful. Why not list_list/2? After all, your relation is about two lists! OK, maybe that name was a bit too general. What about list__list_without_adjacent_elements/2? Lengthy, but relational. Maybe we shorten that to: list_noadjs/2. Note the s at the end: That means: it's a plural which means it's a list.

    Observe properties

    Before thinking about "doing" this or that. Rather muse about concrete examples - preferably ground examples, as you have given them. And about other properties. One observation is that all elements of the second list will be there in the first. In fact not only that. But they will also occur in the same order. Let me try to formulate that. Of course, that observation is not sufficient to write an entire predicate. But here comes the cool thing in Prolog: We do not need to implement everything. We can start with gross generalizations that contain all what we want plus some more.

    Start with a too general definition.

    To show you the most extreme, let's try:

    list_noadjs(_Xs, _Ys).
    

    This is the mother of all binary relations! That definition always succeeds, no matter what. Evidently, we will have to specialize it. Say, by looking at the second argument which is a list:

    list_noadjs(_Xs, []).
    list_noadjs(_Xs, [Y|Ys]) :-
       list_noadjs(_, Ys).
    

    If the list is [] so will be the original one. And both start with the same element!

    list_noadjs(Xs, []) :-
       Xs = [].
    list_noadjs(Xs, [Y|Ys]) :-
       Xs = [Y|_],
       list_noadjs(_, Ys).
    

    or more compactly:

    list_noadjs([], []).
    list_noadjs([Y|_Xs], [Y|Ys]) :-
       list_noadjs(_, Ys).
    

    Now, the first list contains elements of the second list. And in between something else:

    list_noadjs([], []).
    list_noadjs([Y|Xs0], [Y|Ys]) :-
       list_(Xs0,Xs1),
       list_noadjs(Xs1, Ys).
    
    list_(Xs,Xs).
    list_([_|Xs0],Xs) :-
       list_(Xs0,Xs).
    

    Is this already our relation? Let's give it a try:

    ?- list_noadjs("aaab",Ys).
       Ys = "aaab"
    ;  Ys = "aaa"
    ;  Ys = "aab"
    ;  Ys = "aa"
    ;  Ys = "aab"
    ;  Ys = "aa"
    ;  Ys = "ab"  % <===== * RRRIGHT !!!!***
    ;  Ys = "a"
    ;  false.
    

    (Btw. I am using library(double_quotes) to make answers more readable.)

    So we do have the expected solution. Alas, there are many incorrect solutions, too! We will have to continue to specialize this program:

    list_noadjs([], []).
    list_noadjs([Y|Xs0], [Y|Ys]) :-
       eq_list_(Y, Xs0,Xs1),
       list_noadjs(Xs1, Ys).
    
    eq_list_(_, Xs,Xs).
    eq_list_(Y, [Y|Xs0],Xs) :-
       eq_list_(Y, Xs0,Xs).
    

    Now this is much better, but still not perfect:

    ?- list_noadjs("aaab",Ys).
       Ys = "aaab"
    ;  Ys = "aab"
    ;  Ys = "aab"
    ;  Ys = "ab"    % !!! Right
    ;  false.
    

    We have to further specialize the program: After a sequence of identical elements, there must be something else:

    list_noadjs([], []).
    list_noadjs([Y|Xs0], [Y|Ys]) :-
       eq_list_(Y, Xs0,Xs1),
       nohead(Xs1, Y),
       list_noadjs(Xs1, Ys).
    
    eq_list_(_, Xs,Xs).
    eq_list_(Y, [Y|Xs0],Xs) :-
       eq_list_(Y, Xs0,Xs).
    
    nohead([], _X).
    nohead([X|_], Y) :-
       dif(X, Y).
    

    So that's our relation.

    Enjoy the relation!

    Seriously. Don't just use the test cases you had. You have now a relation! That's not a function in disguise, it is truly more than that. Try it out! Ask completely unusual things, like:

    ?- list_noadjs(Xs,"abc").
       Xs = "abc"
    ;  Xs = "abcc"
    ;  Xs = "abccc"
    ;  Xs = "abcccc"
    ; ... .
    

    So here we ask: Which lists correspond to "abc"? Note that only c is repeated! All the other solutions are hidden behind this wall of infinity. But we can play a little trick to get them:

    ?- length(Xs,N), list_noadjs(Xs,"abc").
       Xs = "abc", N = 3
    ;  Xs = "abcc", N = 4
    ;  Xs = "abbc", N = 4
    ;  Xs = "aabc", N = 4
    ;  Xs = "abccc", N = 5
    ;  Xs = "abbcc", N = 5
    ;  Xs = "abbbc", N = 5
    ;  Xs = "aabcc", N = 5
    ;  Xs = "aabbc", N = 5
    ;  Xs = "aaabc", N = 5
    ;  Xs = "abcccc", N = 6
    ; ... .
    

    Don't shy away from non-termation.

    We have already seen it: Very often, we get infinitely many solutions. And (I must admit) it's even worse: Sometimes, our relations do not even terminate.

    Here is one such example. Say, is there a way that the list in the second argument contains duplicates? Or that the following holds:

    ?- list_noadjs(Xs,[X,X]).
       loops.
    

    Prolog answers: mumble, mumble, lemme see...

    To master Prolog, you will have to understand this in detail. But for the moment, there is often a good way out:

    Specialize queries to get termination

    So instead of asking: Is there any term that may correspond to [X,X] we may ask: Is there a list of size 5 (or any other finite size). Now Prolog denies this:

     ?- Xs = [_,_,_,_,_], list_noadjs(Xs,[X,X]).
        false.
    

    That's not the universal answer you wanted. But it is better than nothing.

    Sometimes all these queries are too much for you. Let Prolog do the thinking for you by:

    Enumerating all solutions

    Often, this is very simple. Start with the most general query. The big advantage here is that no thinking is required on your side at all. And still you look like a pro:

    ?- list_noadjs(Xs,Ys).
       Xs = [], Ys = []
    ;  Xs = [_A], Ys = [_A]
    ;  Xs = [_A,_B], Ys = [_A,_B], dif(_B,_A)
    ;  Xs = [_A,_B,_C], Ys = [_A,_B,_C], dif(_B,_A), dif(_C,_B) ?
    ;  Xs = [_A,_B,_C,_D], Ys = [_A,_B,_C,_D], dif(_B,_A), dif(_C,_B), dif(_D,_C)
    ;  Xs = [_A,_B,_C,_D,_E], Ys = [_A,_B,_C,_D,_E], dif(_B,_A), dif(_C,_B), dif(_D,_C)
    ;  ... .
    

    What we got here are so called answers. A single answer may contain infinitely many solutions. Think of this: You are looking at infinity! Some conditions (called constraints) must hold, like dif/2. But that's it.

    The third answer was:

    Xs = [_A,_B], Ys = [_A,_B], dif(_B,_A)
    

    So Xs and Ys are the same list with two distinct elements. So this answer implies Xs = "ab", Ys = "ab" but also Xs = [1,2], Ys = [1,2] and many, many more.

    Even better, enumerate all answers in a systematic ("fair") manner:

    ?- length(Xs, N), list_noadjs(Xs,Ys).
       Xs = [], N = 0, Ys = []
    ;  Xs = [_A], N = 1, Ys = [_A]
    ;  Xs = [_A,_B], N = 2, Ys = [_A,_B], dif(_B,_A)
    ;  Xs = [_A,_A], N = 2, Ys = [_A]
    ;  Xs = [_A,_B,_C], N = 3, Ys = [_A,_B,_C], dif(_B,_A), dif(_C,_B)
    ;  Xs = [_A,_B,_B], N = 3, Ys = [_A,_B], dif(_B,_A)
    ;  Xs = [_A,_A,_B], N = 3, Ys = [_A,_B], dif(_B,_A)
    ;  Xs = [_A,_A,_A], N = 3, Ys = [_A]
    ;  Xs = [_A,_B,_C,_D], N = 4, Ys = [_A,_B,_C,_D], dif(_B,_A), dif(_C,_B), dif(_D,_C)
    ; ... .
    

    These are all solutions up to length 3. There are no other! We know this for sure because the last answer is already of size 4. So all solutions below are here already!

    Often looking at such answers is very helpful. For example, it permits you to rapidly detect errors (like in the other answer that was given previously). So, don't tell nobody about that trick.