I have this table in my database in oracle9i:
CREATE TABLE involved_in
(
rid number,
aid number NOT NULL,
fid number NOT NULL,
role varchar(80),
note varchar(80),
job varchar(35),
PRIMARY KEY(rid),
FOREIGN KEY(fid) REFERENCES production(pr_id),
FOREIGN KEY(aid) REFERENCES person(pid)
);
Which contains data about actors (aid) who worked in films (fid).
What I want to do is similar to working out the Bacon number
Except my kevin bacon is known for having the aid 517635.
I have no clue as to how to work out (only using SQL statements) the number of actors I have to connect a given actor (by another aid) with to find a connection to 517635.
The result of the query would be either a list of all the actors the given actor has to connect to get to my guy or just a number.
For that I thought I'd have to get first all the actors 517635 has worked with and those would have a 1 as a result for having worked directly with him. The table is not too big but big enough to make that inviable.
Examples, let's say Brad Pitt is my 517635. He worked with Angelina Jolie in Mr. And Mrs Smith, that would make her number 1. If Brad Pitt has never worked in any movie with Bruce Willis (let's say that's the case) but Angelina Jolie has been in one with him then Bruce Willis' number with respect to Brad Pitt would be 2.
In my query if the given number was Angelina's the result would be: "Brad Pitt 1" or simply "1" If the given number was Willis' the result would be: "Angelina Jolie Brad Pitt 2" or "2" Example of what is in the table:
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(1, 33, 1584953, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(2, 1135, 1999660, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(3, 1135, 2465724, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(4, 6003, 2387806, 'Himself', '(archive footage)', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(5, 13011, 1935123, 'Himself', 'NULL', 'actor');
I have nothing in my mind, I'm completely new to SQL and all I can think of leads to infinite loops with a variable to count the number of loops. Any ideas as to where to begin and, with luck, end?
This is difficult to verify without sample data, but the following answer is, at a minimum syntactically correct.
What you want here is clearly a recursive query. There are to ways to do this in Oracle: using common-table expressions (CTE) (i.e. a with
clause); or using the connect by
clause. CTEs are SQL standard and connect by
is proprietary, but I find connect by
less confusing, so that's what I'll use (also recursive CTEs are not supported in 9i).
SELECT aid, MIN (lvl) AS distance
FROM (SELECT ii2.aid, LEVEL AS lvl
FROM involved_in ii1
JOIN involved_in ii2
ON ii1.fid = ii2.fid AND ii1.aid <> ii2.aid
START WITH ii1.aid = 517635
CONNECT BY NOCYCLE ii1.aid = PRIOR ii2.aid)
GROUP BY aid
aid
that share a fid
to each other.connect by
clause joins each set of connected aid
to another set of aid
.nocycle
clause prevents infinite recursion.level
is keyword that gives us the number of times the recursion has occurred. We have to get the minimum because any given aid
could connect to the starting aid
via multiple paths.and level <= 100
to the connect by
clause (in this case, limiting the results to a distance of 100 or less).As pointed out in the comments, 9i doesn't support nocycle
. In addition, the OP is running out of temp space when running this query. The two are actually unrelated: if a cycle occurs, Oracle will throw an error; this implies that the server is running out of temp space before a cycle is found. I don't see any good way around either of these issues.
It is possible to specify an end point (to some degree). You can add AND ii1.aid <> 2
where 2
is the aid
of the end-point to the on
clause. This will cause the query to stop navigating a branch when it encounters that value. This may not help with the above issues however, as it will only short-circuit those branches where the provided values are found. It still has to evaluate all of the other branches in case the value is in them somewhere.