Search code examples
prolog

Prolog, how to limit "generate and test" when generating an infinite sequence


I'm trying to write a predicate which "eliminates consecutive duplicates of list elements". My predicate looks like this:

compress([], []).
compress([X], [X]).
compress([X, X|Unpacked], [X|Rest]) :-
  % pull an element off Unpacked
  compress([X|Unpacked], [X|Rest]).
compress([X, Y|Unpacked], [X|Rest]) :-
  dif(X, Y), % we've reached the end of this subsequence
  compress([Y|Unpacked], Rest).

And it works great for mostly-instantiated queries:

compress([a, b], [a, b]) -> true
compress([a, a, b], [a, b]) -> true
compress([a, b, Z], X) -> Z = b, X = [a, b] OR X = [a, b, Z], dif(Z, b)

I just started learning Prolog this week, that last one is pretty cool!

This predicate, however, gives unsatisfactory results when used differently:

compress(X, [a, b]) -> never terminates

?- compress2(X, Y).
X = Y, Y = [] ;
X = Y, Y = [_904] ;
X = [_904, _904],
Y = [_904] ;
X = [_904, _904, _904],
Y = [_904] ;
X = [_904, _904, _904, _904],
Y = [_904] ;
X = [_904, _904, _904, _904, _904],
Y = [_904] .
(etc)

So, I wrote a wrapper around it which is better for those kinds of queries, it uses "generate and test" to exhaustively try all options for a given size of list before moving on to larger lists:

sizes(Large, Small) :-
  between(0, inf, Large),
  between(0, Large, Small).

compress2(Unpacked, Packed) :-
  sizes(Large, Small),
  length(Unpacked, Large),
  length(Packed, Small),
  compress(Unpacked, Packed).

This new predicate works great for general queries:

?- compress2(X, [a, b]).
X = [a, b] ;
X = [a, a, b] ;
X = [a, b, b] ;
X = [a, a, a, b] ;
X = [a, a, b, b] ;
(etc)

?- compress2(X, Y).
X = Y, Y = [] ;
X = Y, Y = [_658] ;
X = [_658, _658], Y = [_658] ;
X = Y, Y = [_1162, _1168], dif(_1162, _1168) ;
X = [_658, _658, _658], Y = [_658] ;

Both of the above eventually generate every alternative.

However, this predicate fails on the original queries:

compress2([a, b], [a, b]). -> returns true, then loops forever

It loops forever because I'm doing "generate and test", and the sizes predicate generates every size. This works great until we find the answer, but once we've found the answer we backtrack and then generate an infinite number of sizes which are guaranteed to never work, they're too big.

So my question is, is there a pure way to make a compress predicate which works for every query? Everything I try fixes one type of query while breaking others.


Solution

  • Here is a general approach to such problems:

    1 Take relational names

    You are using an imperative name for a relation. Try to use names that reflect what is most interesting in the relation and thus not a command. A query like compress(Xs, [a,b]) will not compress anything. So why call it like that? At first you might think that "you can handle it", well at least I can't. To start with, take the type of each argument, like list_list/2, and then refine it, since the name is not specific enough (there are many relations between two lists, after all). I will go with: list_compressed([a,a,a,b,b,b,b,b],[a,b]).

    2 Consider the set of solutions

    That's also called the declarative semantics. Thus consider the set of all true ground queries. And see if there is some interesting structure in there. Here are some:

    2.1 Functional dependency

    ... between the first and second argument. That is: For each ground term for the first argument there is (at most) one ground term for the second argument.

    However, this does not hold in the other direction! In fact, for each ground term for the second argument (that holds), there are infinitely many ground terms for the first argument!

    One of the nice consequences of this functional dependency is that the size of the second argument is always bound by the first. Thus no need for blind enumeration of the 2nd arguments as you do.

    2.2 Types

    Both arguments have to be lists. There does not seem any space for overgeneralizations as for example in append([], non_list, non_list).

    2.3 Finiteness

    The set of solutions for list_compressed/2 is clearly infinite. But there are two independent sources of infinity. One is caused by the length of lists, and the other is caused by the arbitrariness of the arguments. The goal list_compressed([A,A],[A]) holds for infinitely many terms. But those terms A are completely unconstrained. So in a sense, the answer is finite, whereas the set of solutions is still infinite.


    All these purely declarative notions already constrain the procedural code we can write — without ever interfering with it directly!

    3 Understand termination

    In particular, you need to understand the difference between (universal) termination and finding an answer (sometimes called existential termination). See this answer for more.

    You kind-of complained that compress(X, [a, b]) never terminates. Well, X has an infinite set of solutions that cannot be expressed compactly like X = [a,a,b] ; X = [a,a,a,b] ... Prolog tries to enumerate that set, but alas, it starts with the largest answer first!

    Also, the first thing to look at (in your actual code), is the following :

    sizes(Large, Small) :-
      between(0, inf, Large), false.  % SWI-specific, rather use length(_, Large)
      between(0, Large, Small).
    
    compress2(Unpacked, Packed) :-
      sizes(Large, Small), false,
      length(Unpacked, Large),
      length(Packed, Small),
      compress(Unpacked, Packed).
    

    This alone does not terminate and thus your original program will not terminate. Never, ever! To see this, look at Unpacked and Packed. In this fragment nobody is interested in those at all. Everything that interests those variables is behind false and thus irrelevant. So it is as if you had used _ in their stead.

    Since Packed is already constrained by Unpacked but not vice versa, simply add the goal length(Unpacked, _) in front. That's all you can do (with reasonable effort).

    So we have:

    list_compressed2(List, Compressed) :-
       length(List, _),
       compress(List, Compressed).
    

    That is fine for all cases where there are answers/solutions. It still can be improved for cases where there is no solution. Like

    ?- list_compressed2(_, [a,a|_]).
       loops.
    

    which should fail (and thus terminate), but does not.