Search code examples
sqlunit-testingprolog

Prolog to SQL: Any way to improve SQL code for unit tests and fix an edge case elegantly?


Inspired by this StackOverflow question:

Find mutual element in different facts in swi-prolog

We have the following

Problem statement

Given a database of "actors starring in movies" (starsin is the relation linking actor "bob" to movie "a" for example)

starsin(a,bob).
starsin(c,bob).

starsin(a,maria).
starsin(b,maria).
starsin(c,maria).

starsin(a,george).
starsin(b,george).
starsin(c,george).
starsin(d,george).

And given set of movies M, find those actors that starred in all the movies of M.

The question was initially for Prolog.

Prolog solution

In Prolog, an elegant solution involves the predicate setof/3, which collects possible variable instantiations into a set (which is really list without duplicate values):

actors_appearing_in_movies(MovIn,ActOut) :-
    setof(
        Ax,
        MovAx^(setof(Mx,starsin(Mx,Ax),MovAx), subset(MovIn,MovAx)),
        ActOut
    ).    

I won't go into details about this, but let's look at the test code, which is of interest here. Here are five test cases:

actors_appearing_in_movies([],ActOut),permutation([bob, george, maria],ActOut),!. 
actors_appearing_in_movies([a],ActOut),permutation([bob, george, maria],ActOut),!.
actors_appearing_in_movies([a,b],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c],ActOut),permutation([george, maria],ActOut),!.
actors_appearing_in_movies([a,b,c,d],ActOut),permutation([george],ActOut),!.

A test is a call to the predicate actors_appearing_in_movies/2, which is given the input list of movies (e.g. [a,b]) and which captures the resulting list of actors in ActOut.

Subsequently, we just need to test whether ActOut is a permutation of the expected set of actors, hence for example:

permutation([george, maria],ActOut)`

"Is ActOut a list that is a permutation of the list [george,maria]?.

If that call succeeds (think, doesn't return with false), the test passes.

The terminal ! is the cut operator and is used to tell the Prolog engine to not reattempt to find more solutions, because we are good at that point.

Note that for the empty set of movies, we get all the actors. This is arguably correct: every actors stars in all the movies of the empty set (Vacuous Truth).

Now in SQL.

This problem is squarely in the domain of relational algebra, and there is SQL, so let's have a go at this. Here, i'm using MySQL.

First, set up the facts.

DROP TABLE IF EXISTS starsin;

CREATE TABLE starsin (movie CHAR(20) NOT NULL, actor CHAR(20) NOT NULL);

INSERT INTO starsin VALUES
   ( "a" , "bob" ),
   ( "c" , "bob" ),
   ( "a" , "maria" ),
   ( "b" , "maria" ),
   ( "c" , "maria" ),
   ( "a" , "george" ),
   ( "b" , "george" ),
   ( "c" , "george" ),
   ( "d",  "george" );

Regarding the set of movies given as input, giving them in the form of a (temporary) table sounds natural. In MySQL, "temporary tables" are local to the session. Good.

DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");

Approach:

The results can now be obtained by getting, for each actor, the intersection of the set of movies denoted by movies_in and the set of movies in which an actor ever appeared (created for each actor via the inner join), then counting (for each actor) whether the resulting set has at least as many entries as the set movies_in.

Wrap the query into a procedure for practical reasons. A delimiter is useful here:

DELIMITER $$

DROP PROCEDURE IF EXISTS actors_appearing_in_movies;

CREATE PROCEDURE actors_appearing_in_movies()
BEGIN

SELECT 
     d.actor 
   FROM 
     starsin d, movies_in q
   WHERE 
     d.movie = q.movie 
   GROUP BY 
     actor 
   HAVING 
     COUNT(*) >= (SELECT COUNT(*) FROM movies_in);

END$$

DELIMITER ;

Run it!

Problem A appears:

Is there a better way than edit + copy-paste table creation code, issue a CALL and check the results "by hand"?

DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
CALL actors_appearing_in_movies();

Empty set!

Problem B appears:

The above is not desired, I want "all actors", same as for the Prolog solution. As I do not want to tack a weird edge case exception onto the code, my approach must be wrong. Is there one which naturally covers this case but doesn't become too complex? T-SQL and PostgreSQL one-liners are fine too!

The other test cases yield expected data:

DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b");
CALL actors_appearing_in_movies();
+--------+
| actor  |
+--------+
| george |
| maria  |
+--------+

DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c");
CALL actors_appearing_in_movies();
+--------+
| actor  |
+--------+
| george |
| maria  |
+--------+

DROP TABLE IF EXISTS movies_in;
CREATE TEMPORARY TABLE movies_in (movie CHAR(20) NOT NULL);
INSERT INTO movies_in VALUES ("a"), ("b"), ("c"), ("d");
CALL actors_appearing_in_movies();
+--------+
| actor  |
+--------+
| george |
+--------+

Solution

  • And given set of movies M, find those actors that starred in all the movies of M.

    I would use:

    select si.actor
    from starsin si
    where si.movie in (<M>)
    group by si.actor
    having count(*) = <n>;
    

    If you have to deal with an empty set, then you need a left join:

    select a.actor
    from actors a left join
         starsin si
         on a.actor = si.actor and si.movie in (<M>)
    group by a.actor
    having count(si.movie) = <n>;
    

    <n> here is the number of movies in <M>.

    Update: The second approach in extended form

    create or replace temporary table 
       actor (actor char(20) primary key)
       as select distinct actor from starsin;
    
    select 
       a.actor,
       si.actor,si.movie  -- left in for docu
    from 
       actor a left join starsin si
         on a.actor = si.actor 
            and si.movie in (select * from movies_in)
    group 
       by a.actor
    having
       count(si.movie) = (select count(*) from movies_in);
    

    Then for empty movies_in:

    +--------+-------+-------+
    | actor  | actor | movie |
    +--------+-------+-------+
    | bob    | NULL  | NULL  |
    | george | NULL  | NULL  |
    | maria  | NULL  | NULL  |
    +--------+-------+-------+
    

    and for this movies_in for example:

    +-------+
    | movie |
    +-------+
    | a     |
    | b     |
    +-------+
    

    movie here is the top of the group:

    +--------+--------+-------+
    | actor  | actor  | movie |
    +--------+--------+-------+
    | george | george | a     |
    | maria  | maria  | a     |
    +--------+--------+-------+