Search code examples
relational-algebradividerelational-division

Seeking extended Divide operator explanation


I am reading about Codd’s Eight Original Operators in Inside Microsoft SQL Server 2008: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, and Steve Kass and do not understand the Divide operator.

Quotes defining the Divide operator:

“A divisor relation is used to partition a dividend relation and produce a quotient relation. The quotient relation is made up of those values of one column from the dividend table for which the second column contains all of the values in the divisor.”

This statement is in agreement with Wikipedia's definition and example.

“The formula for the Divide operator includes three relations: a divide by b per c, where a is the dividend, b is the divisor, and c is the mediator relation. Let relation a have attributes A and relation b attributes B. The Divide operator returns a relation that includes of all tuples from divisor such that a tuple {A, B} appears in the mediator relation for all tuples from divisor relation.”

The diagram below is used to demonstrate this statement. I believe the relations are presented in the following order: dividend, divisor, mediator, and end result.

enter image description here

The second relation (the divisor) has {a, x}, {a, z}, {b, x} and {b, z} for tuples. My thought process follows that since there are tuples {b, x} and {b, z} that b should be included in the end result. I've check the book corrections on the book's website (linked at the beginning of this post) and am certain that I am wrong.

Why is the result of the diagram example a and not a and b?


Solution

  • Relational Division has always been a mess, and is likely to stay that way. It was originally invented as the means for relational query systems to be able to formulate/answer questions such as, e.g., "what is the list of customers who have subscribed ALL possible insurance policy types". That is, it was intended as the vehicle for formulating queries that involved some kind of universal quantification as a predicate determining the result set.

    Elaborating further on my customers/policies example, let's assume that the set of "ALL possible insurance policy types" is itself time-varying, i.e. over time, new policy types can emerge, while others can be discontinued. Let's further assume that "ALL possible insurance policy types" in a certain query, means, specifically, "ALL policy types that are currently open for subscrition by customers" (that is, the discontinued policy types are not part of this set of "ALL" types).

    Let's assume that this set of "ALL possible policy types" at a certain moment is {TYPE1, TYPE3}. TYPE2 has been discontinued. Let's also assume that customer ES still has a policy of type TYPE2, obviously dating from before when it was discontinued. Thus customer ES has policies of type {TYPE1, TYPE2, TYPE3}.

    Now answer the question whether this customer has "ALL policy types that are currently open for subscription". Your answer should be a firm 'yes'. You might get a sense of where this is going : relational division centers around comparison of two sets. One is a "comparand" (the customer's set of subscribed policy types), the other one is a "reference" (the set of policy types that are currently open for subscription).

    Now there are at least two useful comparisons that can be made between sets : one is for equality, the other is for set containment (subsetness). In some cases of querying you will want the equality test, while in others you will want the containment test. And the relational operator called 'division' (in whatever of its umpty flavours) does not allow to make this distinction. I gather it is this phenomenon that you are asking about, and the answer is simply that the choice is made by design, so to speak, and hardwired into the operator's definition. It makes the operator "useful" in those cases where its definition matches your needs, and useless in other cases.

    The sort-of good news is, when you have to spell out the SQL for a relational division, there won't be much difference between division-with-equality and division-with-containment (despite the fact that the algebra operator is, by definition, only one of the two, and the other one simply even does not have an algebraic operator). Main problem is that set equality itself is very messy to express in SQL, and that's not any less so "inside a relational division query" ...

    And then there are still all the valid points already made by philip. Read them, but very carefully.