Search code examples
databasedatabase-designrelational-databasedatabase-schemadatamodel

Data Modeling: Supertype / Subtype


Looking to figure out the proper way to model the below requirements.

  1. There are 3 types of “parties” to be concerned with, a Fan, a Band, and a BandMember.
  2. That BandMember will always be associated with a Band and can also be a Fan of any band.
  3. There are common attributes between a Fan, a Band, and a BandMember, but each of these 3 will also have their own unique attributes.
  4. A Fan can be a fan of of any Band or none at all

This is a small part of a bigger thought but it is creating confusion in expanding the model. I believe it would have to be diagram 2 or some other option since I don't see how a BandMember can be associated with a Band in the first model.

I appreciate any input.

alt text

alt text


Solution

  • Caveat

    1. First a couple of caveats for understanding the limitations. All the data being used or stored needs to be considered/modelled together. Eg. you have found that out anyway in your "creating confusion in expanding the model". From my side, not knowing how the Parties (subtypes) are related to other entities, limits me from provided a totally correct answer (that will not change).

      Since you are providing the data in two tranches, therefore the answer will be in two tranches, and the second tranche will require changes to the first model. Not a complaint, just advising you beforehand, as that could be avoided if I saw all the data up front.

    2. It is really great that you appreciate the need for (a) modelling the data and (b) going through the science (documented for over 30 years) of Conceptual, Logical, then physical. That is exactly what I do. You cannot imagine the time and effort that is saved by the formal process.

      • However, that does not come across in my answers at SO, because it is a "question and answer" site, I have to answer at the level of the questioner. (I answer questions more fully than others, and even that, causes negative commentary !). Be assured that I go through the formal sequence.
    3. It must be mentioned that Relation Modelling is the work of giants in the industry, Dr E F Codd and R Brown in the 1980's. The Methodology is based on their work. IDEF1X became a NIST Standard in 1993. When I answer Data Modelling questions, I am not supplying some personal method that I wrote a book about, I stand on the shoulders of giants.

      • I adhere to the Relational Model by Dr E F Codd.

      • I reject the work of Date; Darwen; Fagin; etc, because it contradicts the Relational Model, it is anti-Relational.

      • ERD (P Chen) is pre-relational, pre-IDEF1X, and primitive in comparison. It has not concept of the Identifier, it cannot handle a Relational Key. It is staggering that ERD is still being taught as "relational", and IDEF1X is suppressed. For 30 years.

    Data Modelling

    1. Yes, a Supertype-Subtype structure is correct here. Unfortunately it is not common and therefore not commonly understood.
    • Subtypes existed long before the Relational Model, and continues to exist. IDEF1X, the only Standard for Relational Data Modelling, has specific symbols for it. ERD has nothing.
    1. Separately, even where Subtypes are implemented, they are implemented in very limited fashion. Subtypes have the effect of changing the Role of the Supertype. The correct implementation of that is very rare indeed, I have not seen this anywhere (except of course my own database implementations for customers, and a few high-end suppliers).

    2. Point is, that may look "complex" but it isn't, it is actually very simple.

      This is where Ken Downs and Chris Behrens confuse modelled simplicity (highly extensible) with unmodelled implementation (incorrect and un-extensible), due to the simplistic approach advised by dwarves such as Martin Fowler. No offence, I understand that people are attached to, and will defend, what they know, however limited as that may be.

      • notice that each Subtype is also a perfectly valid Entity (Table in the Physical, when we get to that stage) in its own right, and can stand on its own.

      • for the lower levels or transaction or function tables, which have Relations to these Subtypes, the trick is to use the correct Subtype (Role). The common mistake is they use Party, and then the meaning of the Subtype or Role, and the correct Referential Integrity is lost.

      • separately all the RoleNames are derived from Party, but that is not a valid reason to useParty instead of the correct Role.

      • Here you understand the data really well, but (no one has taught you this and) you have confused Roles and Subtypes.

      • BandMember and Fan are not Parties. They are Persons, first (and Person is a Party, second)

    3. In order to provide clarity re those points, at this Conceptual level, we need to work with Entities and Identifiers (not Attributes), rather than just Entities. Therefore I have provided that as well.

      • It appears you have ERwin (the best!); it allows you to view the single model at that level very conveniently. Do implement the Identifiers in the Entities, even at this abstract level.

    Obstacle

    I point these out before presenting the model, because you appear to be seriously interested in learning IDEF1X, the Standard methodolgy for modelling Relational Databases, with a view to easing your models in the future. SO or any website, is not a good medium for formal interactive education, but we will give it our best shot.

    1. In model (1), Band cannot be Independent (square corners): since it is identified as being Dependent on Party; it is a Subtype of Party; and it has the same Identifier.

    2. The missing Cardinality is critical. Putting it in will actually assist in resolving the model. I am not fussed re IDEF1X (circles) vs IEEE (crows feet), but I always put them in as I go, and keep changing them as the model progresses.

      • your model does not show that a Band is made up of one-to-many Members. Et cetera.
        .
        While programming can progress incrementally (once the definition is stable)), modelling does not. Eg. you can't model the Entities but not the Relations; the Relations but not the Cardinality. That is why they are different sciences, programmers do not make good modellers, and vice versa.
        .
    3. At this stage the Rules are also very important. Modelling is, in fact, modelling the Rules. Therefore correcting or modulating the Rules is part of the Modelling process.

      • A Fan can be a fan of of any Band or none at all is not reasonable. If a Person is none at all then they are a member of the general public, and they have no Relation to any Band. An ordinary Person.

      • A Fan has a Relation to at least one Band. In fact, having a Relation to a Band is what takes a Person out of that realm and causes storage of Fan details or specific fan-of-band details.

      • If there are is such an Entity as Fan with no Band (ie. you are storing details of that, separate to Fan as per my model), please advise, and I will change the model (paper is cheap!).

    4. Verb Phrases are also important at this stage; no less than my point re Rules and Cardinality above, it is a part of the modelling process, and it needs change/modulation as the model progresses. You will not believe how important it is to gett the Verb Phrases right. Putting them in may well have assisted you in clarifying Subtypes vs Roles. Here is a definition that every Data Modeller knows by heart.

      • The Entities are the Nouns in the model

      • The Relations are the Verbs, the actions that take place between the Nouns

      • The Verb Phrases define those actions (that's why they are accurately called Verb Phrases, it is not a funny name).

    As described in the IDEF1X Notation document, for Associative tables, read the Verb Phrase "through" them, to the parent on the other side of the association.

    • A Person makes one-to-many Bands, and is thus a Member

    • A Band is made up of one-to-many People, who are Members

    • A Person patronises a Band, which makes them a Fan (not merely a Person who has a row in the Fan table)

    • A Band depends on People, who are Fans

    Coming up with the shortest, most meaningful, Verb Phrase; not using simplistic words ("comprises" is to be avoided"), is a challenge for Modellers. Feel free to ameliorate the Verb phrases I supplied.

    Here is your Party Data Model at the Entity and Key levels in IDEF1X.

    Readers who are unfamiliar with the Standard for Modelling Relational Databases may find my IDEF1X Notation useful.

    Note

    All I have done is resolve the Subtypes; Roles; the Cardinality of the Relations, as identified above.

    1. The relevance of the Subtypes vs Roles will be more clear to you when you evaluate your second tranche or your transaction Entities (as you have stated, here we have only Identifying Entities).

    2. Identifiers. This is worth spelling out, not only for the purpose of clarifying the model, also because it is a good example of IDEF1X Identifiers being used, and the power they deploy. You have already indicated in your model that you understand that (solid lines), I am merely giving it the full treatment.

      • Person and Band are Subtypes of Party. They are also Roles of Party. Therefore we use PersonId and BandId from that point downward, not PartyId (even though it is PartyId).

      • When a Person plays the Role of Member, we use MemberId (which is PersonId, which is PartyId).

      • When a Person plays the Role of Fan, we use FanId (which is PersonId, which is PartyId).

    Let's say you were listing the Fans of a Band, your query is centred on Fan. If you had those Id Surrogate Keys on every table, you would be forced to join Person, then join Party. But with the Relational Identifiers you have, you can go directly to Party:

    SELECT  ...,
            Name  -- Party.Name
            ...
        FROM Party
        JOIN Fan
            ON PartyId = FanId
    

    and skip the Band table in-between. Yes, the truth is, Normalised Relational Database require fewer joins, less resources (processing, cache, disk I/O), and that is one reason why they perform so much better. The myth has no scientific basis.

    Please evaluate and ask specific questions.

    Update

    For a generic treatment of Subtypes, and implementation details for genuine SQL platforms, refer to my Subtype Document.