list prolog unions functional-dependencies

Prolog union fails

I'm trying to understand the use of union (the built in predicate) in Prolog. In many cases it seems to fail when it should succeed. It seems it has something to do with the order of the elements of the lists. All of the below cases fail (they come back with "false.").

?- union([1,2,3],[],[2,3,1]).  
?- union([1,5,3], [1,2], [1,5,3,2]). 
?- union([4,6,2,1], [2], [1,2,4,6]).
?- union([1,2], [], [2,1]).

Shouldn't all of these be true? Any explanation as to why these cases keep failing would be very helpful.

Also: Why does the below not succeed and find the correct list for A?

 ?- union([1,5,3], A, [4,1,5,3,2]).  /** comes back with "fail." */

Solution

There are a couple of issues here. Declarative and procedural ones. Let's start with the declarative ones, they are really sitting a bit deeper. The procedural aspects can be handled easily with appropriate programming techniques, as in this answer.

When we consider declarative properties of a predicate, we consider its set of solutions. So we pretend that all we care about is what solutions the predicate will describe. We will completely ignore how all of this is implemented. For very simple predicates, that's a simple enumeration of facts - just like a database table. It is all obvious in such situations. It becomes much more unintuitive if the set of solutions is infinite. And this happens so easily. Think of the query

?- length(Xs,1).

This harmless looking query asks for all lists of length one. All of them! Let me count - that's infinitely many!

Before we look at the actual answer Prolog produces, think what you would do in such a situation. How would you answer that query? Some of my feeble attempts

?- length(Xs,1).
   Xs = [1]
;  Xs = [42]
;  Xs = [ben+jerry]
;  Xs = [feel([b,u,r,n])]
;  Xs = [cromu-lence]
;  Xs = [[[]]]
;  ... . % I am running out of imagination

Should Prolog produce all those infinitely many values? How much time would this take? How much time do you have to stare at walls of text? Your lifetime is clearly not enough.

Taming the number of solutions, from solutions to answers

There is a way out: The logic variable!

?- length(Xs, 1).
   Xs = [_A].
%     ^^

This little _A permits us to collapse all strange solutions into a single answer!

So here we really had a lot of luck: we tamed the infinity with this nice variable.

Now back to your relation. There, we want to represent sets as lists. Lists are clearly not sets per se. Consider the list [a,a] and the list [a]. While they are different, they are meant to represent the same set. Think of it: How many alternate representations are there for [a]? Yep, infinitely many. But now, the logic variable cannot help us to represent all of them compactly¹. Thus we have to enumerate them one-by-one. But if we have to enumerate all those answers, practically all queries will not terminate due to infinitely many solutions to enumerate explicitly. OK, some still will:

?- union([], [], Xs).
   Xs = [].

And all ground queries. And all failing queries. But once we have a variable like

?- union([a], [], Xs).
   Xs = [a]
;  Xs = [a,a]
;  Xs = [a,a,a]
;  ... .

we already are deep into non-termination.

So given that, we have to make some decisions. We somehow need to tame that infinity. One idea is to consider a subset of the actual relation that leans somehow to a side. If we want to ask questions like union([1,2],[3,4], A3) then it is quite natural to impose a subset where we have this functional dependency

A1, A2 → A3

With this functional dependency we now determine exactly one value for A3 for each pair of A1, A2. Here are some examples:

?- union([1,5,3], [1,2], A3).
   A3 = [5,3,1,2].
?- union([1,2,3], [], A3).
   A3 = [1,2,3].

Note that Prolog always puts a . a the end. That means Prolog says:

Dixi! I have spoken. There are no more solutions.

(Other Prologs will moan "No" at the end.) As a consequence, the queries (from your comments) now fail:

?- union([1,5,3], [1,2], [1,5,3,2]).
   false.
?- union([1,2,3],[],[2,3,1]).
   false.

So imposing that functional dependency now restricts the set of solutions drastically. And that restriction was an arbitrary decision of the implementer. It could have been different! Sometimes, duplicates are removed, sometimes not. If A1 and A2 both are duplicate free lists, the result A3 will be duplicate free, too.

After looking into its implementation, the following seems to hold (you do not need to do this, the documentation should be good enough - well it isn't): The elements in the last argument are structured as follows and in that order:

The elements of A1 that do not occur in A2, too. In the relative order of A1.
All elements of A2 in their original order.

So with this functional dependency further properties have been sneaked in. Such as that A2 is always a suffix of A3! Consequently the following cannot be true, because there is no suffix of A3 that would make this query true:

?- union([1,5,3], A2, [4,1,5,3,2]).
   false.

And there are even more irregularities that can be described on a declarative level. Often, for the sake of efficiency, relations are too general. Like:

?- union([],non_list,non_list).

Such concerns are often swiped away by noting that we are only interested in goals with arguments that are either lists (like [a,b]) or partial lists (like [a,b|Xs]).

Anyway. We finally have now described all the declarative properties we expect. Now comes the next part: That relation should be implemented adequately! There again a new bunch of problems awaits us!

With library(lists) of SWI, I get:

?-        union([1,2], [X], [1,2,3]).
   false.

?- X = 3, union([1,2], [X], [1,2,3]).
   X = 3.

Which is really incorrect: This can only be understood procedurally, looking at the actual implementation. This no longer is a clean relation. But this problem can be fixed!

You can avoid the correctness issues altogether by sticking to the pure, monotonic subset of Prolog. See above for more.

_{_{1) To tell the truth, it would be possible to represent that infinite set with some form of constraints. But the mere fact that there is not a single library for sets provided by current Prolog systems should make it clear that this is not an obvious choice.}}