Inspired by this StackOverflow question:
Find mutual element in different facts in swi-prolog
We have the following
Given a database of "actors starring in movies" (starsin is the relation linking actor "bob" to movie "a" for example)
starsin(a,bob). starsin(c,bob). starsin(a,maria). starsin(b,maria). starsin(c,maria). starsin(a,george). starsin(b,george). starsin(c,george). starsin(d,george).
And given set of movies M, find those actors that starred in all the movies of M.
The question was initially for Prolog.
In Prolog, an elegant solution involves the predicate
setof/3
,
which collects possible variable instantiations into a set (which is really list without
duplicate values):
actors_appearing_in_movies(MovIn,ActOut) :-
setof(
Ax,
MovAx^(setof(Mx,starsin(Mx,Ax),MovAx), subset(MovIn,MovAx)),
ActOut
).
I won't go into details about this, but let's look at the test code, which is of interest here. Here are five test cases:
actors_appearing_in_movies([],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a,b],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c,d],ActOut),permutation([george],ActOut),!.
A test is a call to the predicate actors_appearing_in_movies/2
, which is given
the input list of movies (e.g. [a,b]
) and which captures the resulting list of
actors in ActOut
.
Subsequently, we just need to test whether ActOut
is a permutation of the expected
set of actors, hence for example:
permutation([george, maria],ActOut)`
"Is ActOut
a list that is a permutation of the list [george,maria]
?.
If that call succeeds (think, doesn't return with false
), the test passes.
The terminal !
is the cut operator and is used to tell the Prolog engine to not
reattempt to find more solutions, because we are good at that point.
Note that for the empty set of movies, we get all the actors. This is arguably correct: every actors stars in all the movies of the empty set (Vacuous Truth).
This problem is squarely in the domain of relational algebra, and there is SQL, so let's have a go at this. Here, i'm using MySQL.
First, set up the facts.
DROP TABLE IF EXISTS starsin;
CREATE TABLE starsin (movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL);
INSERT INTO starsin VALUES
( "a" , "bob" ),
( "c" , "bob" ),
( "a" , "maria" ),
( "b" , "maria" ),
( "c" , "maria" ),
( "a" , "george" ),
( "b" , "george" ),
( "c" , "george" ),
( "d", "george" );
Regarding the set of movies given as input, giving them in the form of a (temporary) table sounds natural. In MySQL, "temporary tables" are local to the session. Good.
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
The results can now be obtained by getting, for each actor, the intersection of the set of
movies denoted by movies_in
and the set of movies in which an actor ever appeared
(created for each actor via the inner join), then counting (for each actor) whether the
resulting set has at least as many entries as the set movies_in
.
Wrap the query into a procedure for practical reasons. A delimiter is useful here:
DELIMITER $$
DROP PROCEDURE IF EXISTS actors_appearing_in_movies;
CREATE PROCEDURE actors_appearing_in_movies()
BEGIN
SELECT
d.actor
FROM
starsin d, movies_in q
WHERE
d.movie = q.movie
GROUP BY
actor
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM movies_in);
END$$
DELIMITER ;
Run it!
Problem A appears:
Is there a better way than edit + copy-paste table creation code,
issue a CALL
and check the results "by hand"?
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
CALL actors_appearing_in_movies();
Empty set!
Problem B appears:
The above is not desired, I want "all actors", same as for the Prolog solution. As I do not want to tack a weird edge case exception onto the code, my approach must be wrong. Is there one which naturally covers this case but doesn't become too complex? T-SQL and PostgreSQL one-liners are fine too!
The other test cases yield expected data:
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
| maria |
+--------+
DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c"), ("d");
CALL actors_appearing_in_movies();
+--------+
| actor |
+--------+
| george |
+--------+
And given set of movies M, find those actors that starred in all the movies of M.
I would use:
select si.actor
from starsin si
where si.movie in (<M>)
group by si.actor
having count(*) = <n>;
If you have to deal with an empty set, then you need a left join
:
select a.actor
from actors a left join
starsin si
on a.actor = si.actor and si.movie in (<M>)
group by a.actor
having count(si.movie) = <n>;
<n>
here is the number of movies in <M>
.
create or replace temporary table
actor (actor char(20) primary key)
as select distinct actor from starsin;
select
a.actor,
si.actor,si.movie -- left in for docu
from
actor a left join starsin si
on a.actor = si.actor
and si.movie in (select * from movies_in)
group
by a.actor
having
count(si.movie) = (select count(*) from movies_in);
Then for empty movies_in
:
+--------+-------+-------+
| actor | actor | movie |
+--------+-------+-------+
| bob | NULL | NULL |
| george | NULL | NULL |
| maria | NULL | NULL |
+--------+-------+-------+
and for this movies_in
for example:
+-------+
| movie |
+-------+
| a |
| b |
+-------+
movie
here is the top of the group:
+--------+--------+-------+
| actor | actor | movie |
+--------+--------+-------+
| george | george | a |
| maria | maria | a |
+--------+--------+-------+