Search code examples
databasedatabase-designrelational-databasedatabase-schemaidentifying-relationship

Data Modeling: Logical Modeling Exercise


In trying to learn the art of data storage I have been trying to take in as much solid information as possible. PerformanceDBA posted some really helpful tutorials/examples in the following posts among others: is my data normalized? and Relational table naming convention. I already asked a subset question of this model here.

So to make sure I understood the concepts he presented and I have seen elsewhere I wanted to take things a step or two further and see if I am grasping the concepts. Hence the purpose of this post, which hopefully others can also learn from. Everything I present is conceptual to me and for learning rather than applying it in some production system. It would be cool to get some input from PerformanceDBA also since I used his models to get started, but I appreciate all input given from anyone.

As I am new to databases and especially modeling I will be the first to admit that I may not always ask the right questions, explain my thoughts clearly, or use the right verbage due to lack of expertise on the subject. So please keep that in mind and feel free to steer me in the right direction if I head off track.

If there is enough interest in this I would like to take this from the logical to physical phases to show the evolution of the process and share it here on Stack. I will keep this thread for the Logical Diagram though and start new one for the additional steps. For my understanding I will be building a MySQL DB in the end to run some tests and see if what I came up with actually works.

Here is the list of things that I want to capture in this conceptual model. Edit for V1.2

  1. The purpose of this is to list Bands, their members, and the Events that they will be appearing at, as well as offer music and other merchandise for sale
  2. Members will be able to match up with friends
  3. Members can write reviews on the Bands, their music, and their events.
    • There can only be one review per member on a given item, although they can edit their reviews and history will be maintained.
    • BandMembers will have the chance to write a single Comment on Reviews about the Band they are associated with. Collectively as a Band only one Comment is allowed per Review.
    • Members can then rate all Reviews and Comments but only once per given instance
  4. Members can select their favorite Bands, music, Merchandise, and Events
  5. Bands, Songs, and Events will be categorized into the type of Genre that they are and then further subcategorized into a SubGenre if necessary. It is ok for a Band or Event to fall into more then one Genre/SubGenre combination.
  6. Event date, time, and location will be posted for a given band and members can show that they will be attending the Event. An Event can be comprised of more than one Band, and multiple Events can take place at a single location on the same day
  7. Every party will be tied to at least one address and address history shall be maintained. Each party could also be tied to more then one address at a time (i.e. billing, shipping, physical)
  8. There will be stored profiles for Bands, BandMembers, and general members.

So there it is, maybe a bit involved but could be a great learning tool for many hopefully as the process evolves and input is given by the community. Any input?

alt text

EDIT v1.1 In response to PerformanceDBA

U.3) That means no merchandise other than Band merchandise in the database. Correct ? That was my original thought but you got me thinking. Maybe the site would want to sell its own merchandise or even other merchandise from the bands. Not sure a mod to make for that. Would it require an entire rework of the Catalog section or just the identifying relationship that exists with the Band? Attempted a mod to sell both complete albums or song. Either way they would both be in electronic format only available for download. That is why I listed an Album as being comprised of Songs rather then 2 separate entities.

U.5) I understand what you bring up about the circular relation with Favorite. I would like to get to this “It is either one Entity with some form of differentiation (FavoriteType) which identifies its treatment” but how to is not clear to me. What am I missing here?

u.6) “Business Rules This is probably the only area you are weak in.”
Thanks for the honest response. I will readdress these but I hope to clear up some confusion in my head first with the responses I have posted back to you.

Q.1) Yes I would like to have Accepted, Rejected, and Blocked. I am not sure what you are referring to as to how this would change the logical model?

Q.2) A person does not have to be a User. They can exist only as a BandMember. Is that what you are asking?

Minor Issue

Zero, One, or More…Oops I admit I forgot to give this attention when building the model. I am submitting this version as is and will address in a future version. I need to read up more on Constraint Checking to make sure I am understanding things.

M.4) Depends if you envision OrderPurchase in the future. Can you expand as to what you mean here?

alt text

EDIT V1.2 In response to PerformanceDBA input...

Lessons learned.

  1. I was mixing the concept of Identifying / Non-Identifying and Cardinality (i.e. Genre / SubGenre), and doing so inconsistently to make things worse.
  2. Associative Tables are not required in Logical Diagrams as their many-to-many relationships can be depicted and then expanded in the Physical Model.
  3. I was overlooking the Cardinality in a lot of the relationships
  4. The importance of reading through relationships using effective Verb Phrases to reassure I am modeling what I want to accomplish.

U.2) In the concept of this model it is only required to track a Venue as a location for an Event. No further data needs to be collected. With that being said Events will take place on a given EventDate and will be hosted at a Venue. Venues will host multiple events and possibly multiple events on a given date. In my new model my thinking was that EventDate is already tied to Event . Therefore, Venue will not need a relationship with EventDate. The 5th and 6th bullets you have listed under U.2) leave me questioning my thinking though. Am I missing something here?

U.3) Is it time to move the link between Item and Band up to Item and Party instead? With the current design I don't see a possibility to sell merchandise not tied to the band as you have brought up.

U.5) I left as per your input rather than making it a discrete Supertype/Subtype Relationship as I don’t see a benefit of having that type of roll up.

Additional Revisions

AR.1) After going through the exercise for FavoriteItem, I feel that Item to Review requires a many-to-many relationship so that is indicated. Necessary? enter image description here

Ok here we go for v1.3

I took a few days on this version, going back and forth with my design. Once the logical process is complete, as I want to see if I am on the right track, I will go through in depth what I had learned and the troubles I faced as a beginner going through this process. The big point for this version was it took throwing in some Keys to help see what I was missing in the past. Going through the process of doing a matrix proved to be of great help also. Regardless of anything, if it wasn't for the input given by PerformanceDBA I would still be a lost soul wondering in the dark. Who knows my current design might reaffirm that I still am, but I have learned a lot so I am know I at least have a flashlight in my hand.

At this point in time I admit that I am still confused about identifying and non-identifying relationships. In my model I had to use non-identifying relationships with non nulls just to join the relationships I wanted to model. In reading a lot on the subject there seems to be a lot of disagreement and indecisiveness on the subject so I did what I thought represented the right things in my model. When to force (identifying) and when to be free (non-identifying)? Anyone have inputs?

enter image description here

EDIT V1.4

Ok took the V1.3 inputs and cleaned things up for this V1.4

Currently working on a V1.5 to include attributes.

enter image description here

EDIT V1.6

Okay, it has been some time since I have posted on here but the work on this project is still ongoing. I am posting V1.6 now which includes a number of changes from the last posting of V1.4. This version shows the further evolution of the Keys. It still does not include the attributes or any AK's or IE's. I have started working on the physical model and used that to help work through the attributes and to try and shed some light on the problems I am having with defining the AK's and IE's. The next posting of the Logical Model will include these keys and the attributes.

enter image description here


Solution

  • Method

    I will cover specifics, but I will cover one or two Subject Areas completely, not all. You can pick that up and apply it to all subject Areas.

    I have not responded to the core Subject Area, because we are still dealing with Identifying Entities. When that is resolved the Reviews, etc will be easier; the Transaction Entities are Dependent on the Identifying Entities.

    Direction

    D.1) I know that I stated that I need to see the whole model. There is one exception. Historic or Temporal or Audit data (eg. the Edit and stored versions). At this early stage, they can be set aside; to be implemented just before completion of the Logical Model. This is in recognition that (a) they are simple Dependents of some parent (b) the parents need to be modelled in relation to all other tables first, and (c) to exclude unnecessary complications, and thus allow us to concentrate on the relevant field.

    • in particular, you can ignore the tense in the Verb Phrases (every location of a version table would otherwise require Has/Had). Stay with present tense for now, because the focus is modelling, not archiving.

    Unresolved

    U.1) Optional Parent
    That is completely disallowed. Not just by IDEF1X, but by any notion of Integrity. If the FK Reference is defined, then there must be a Parent. To allow optional parents, the FK Reference must be removed (or not implemented). Such a condition would exclude the result from qualfying as a "Relational database", by definition. Eg. Address:Order.

    • Of course, in developed countries, an Order must have an Address for legal or taxation reasons; that is separate to the Standard requirement issue.
      .

    U.2) Event
    Party::PartyAddress is correct; Address::PartyAdress is correct. Event::Address needs work. Address is an Identifying Reference table; if used, it would be the parent, Event would be the child. I leave it to you to identify/model multiple Events to a location, and Events at one or multiple locations.

    • There may be a Venue involved. Or a EventOccurrence

    • But if it is a generic Event which happens at multiple locations, that does not need an Entity, the Address is already in Order.

    U.3) Assuming Catalog is an entry in the traditional sense (JCPenney 2011), a list of items for sale or hire.

    • OrderSaleItem is correct

    • Critical point. Catalog is Dependent, and can exist only in the context of a Band, as an Assset. Fine. That means no merchandise other than Band merchandise in the database. Correct ?

    • I can see how "Evening performance with the Blues Brothers" is an Event that can be ordered, invoiced, and paid. Also reviewed, commented, etc.

    • I can't see how Song fits into that. Are the bands selling albums, songs, or both ?

    • Is there no other Band merchandise: concert/event souvenirs; poster; engraved shot glasses ?

    • Consistent with the naming conventions that you reference, and the rest of the database, Catalog (the cotent) should be named Item (the row). You have already (naturally ?) used that in OrderSaleItem,( as opposed to OrderSaleCatalog.

    U.4) Genre

    • No problem with an Item is classified by one-to-many Genres.

    • I think additionally a Genre classifies one-to-many Items. The Relation is one-to-many (which will be resolved as an Associative table when we get to the Physical).

    U.5) Favorite
    The Cardinality of Item::Favorite is reversed. When you correct that, the Favorite Subject Area will require further modelling.

    • Circular relation or dual paths between the same pair of Entities is a signal of an unresolved model. Generally one is correct and the other is redundant. (There are exceptions, but not here; and when this happens the Verb Phrases differentiate them.)

    • Either Band::Favorite xor Item::Favorite is correct, not both.

    • Item::Favorite seems to be correct, because Band is already identified in Item

    • Likewise, one Favorite Entity for bands and merchandise does not sound solid. Every Identifier in the single Favorite Entity is a Party. It would break when we Normalise, might as well demand that the Identifiers be clarified at this stage. It is either one Entity with some form of differentiation (FavoriteType) which identifies its treatment; or one Favorite for bands and another for merchandise, in which case differentiation is not required, ambiguity is eliminated.

    U.6) Business Rules This is probably the only area you are weak in. General response. You have done the tasks separately (all the modelling vs writing BRs). These do not match the model. When you go through the next cycle, take the Business Rules as directives, and modulate them at the same time, as with the Entities, the Relations, and the Verb Phrases.

    Question

    Q.1) User/Friend
    You have the essence of it perfectly. And the Cardinality of the Relations. (Full treatment on this one.) That is correct for Accepted Friend.

    • therefore the tense should be past (go with the majority rows)

    • Requested, and pending Accepted, are the minority. Easily implemented in a IsAccepted Bit or Boolean.

    • Later you may have IsRejected or IsBlocked (that latter should be a separate Entity).

    • Is that what you require ?

    Q.2) What is the basis on which a Person is zero-to-many Users ?

    Minor Issue

    M.1) Singular only.

    M.2) Party Has zero-to-many Addresses. I would think they must have one, in order to transact business (but perhaps not for all Users).

    M.3) Order May Have zero-to-many Payments. "Requires" means that first Payment has to be inserted at the same time as Order.

    • Likewise, for any mandatory children (one-to-many as opposed to zero-to-many) that first child must be inserted at the same time as the parent. This is done via Transactions in enterprise databases, because Immediate Constraint Checking (not Deferred) is implemented; and the small end of town fight over silly things like Deferred Constraint Checking is "better" and then spend half their life figuring out how not to get caught in the infinite loops they created, which trap them. MySQL does not have any at all, so nothing to worry about for this implementation.

    M.4) OrderSaleItem shoulld be OrderItem xor Order should be OrderSale. Depends if you envision OrderPurchase in the future.

    Subject Area Example

    Readers who are unfamiliar with the Standard for Modelling Relational Databases may find IDEF1X Notation useful.

    As stated, I am not providing a finished Data Model, only guidance. This is just one progression of one selected Subject Area. It is not "right" or complete in any way.

    • Your Verb Phrases are excellent. I have provided alternatives for you to consider, they are not "right" or "better". You need to choose an progress them or your own. The goal being the most concise and accurate VP in each case.

    • No suggestion that Person is correct and User is incorrect, that is pending your answer. But I had to use something in the model; since you have modelled them as separate, a counterpoint may be interesting to evaluate.

    So go ahead and progress the model, then post again (just edit the question, leaving the header paras, and replacing the rest).

    V1.1 and Response

    That is certainly a progression.

    I have re-numbered the items in pseudo-legal format, including the section headings, so that we can keep the numbering throughout, and keep adding to it. Actually it really eases the SO editing problems as well.

    U.3) Would it require an entire rework of the Catalog section or just the identifying relationship that exists with the Band?

    • No. That's the great thing about working at this level, the decisions you make here will be the railroad tracks that the data runs on, as freight, or does not run on (and thus needs alternate transport and heavy lifting to derive, in the form of masses of code or an additional data warehouse). And the decisions here are cheap (modelling time, paper).

    • Right now an Item exists only in the context of a Band. It is Dependent. To allow non-band merchandise, it needs to be Independent. And then the existing super/subtype cluster needs rework.

    Attempted a mod to sell both complete albums or song. Either way they would both be in electronic format only available for download. That is why I listed an Album as being comprised of Songs

    • OK. But now you can only sell albums, not songs.

    rather then 2 separate entities.

    • Not sure what you mean (you have two separate entities).

    • It appears you have not seen my Subject Area Example. Note that if you open it now, it contains bits that I have added V1.1; I have not changed what was there yesterday, the V1.0 response.

    • Actually that means you should go through my V1.0 Answer again, while viewing the Example.

    U.5) ... but how to is not clear to me. What am I missing here?

    • An example of one Entity with differentiation is any of the Supertype/Subtype clusters you have. The Favorite is the Supertype, BandFavourite and ItemFavourite are subtypes; allowing each to reference to Band xor Item respectively.

    • You have modelled ItemFavourite. Now the question is, does the fact of a ItemFavourite imply that the Band is Favourite; or is BandFavourite a discrete fact ? In the example, I have modelled the latter, without the Favourite::ItemFavourite/BandFavourite structure.

    Q.1) Yes I would like to have Accepted, Rejected, and Blocked. I am not sure what you are referring to as to how this would change the logical model?

    • No change (I already stated it was pretty complete) to V1.0, but you might need an additional Entity.

    • You need three Bit or Boolean indicators in Friend. That will service these statuses:

    • Requested (but not Accepted)

    • Requested & Accepted
      .

    • But Blocked is not a Friend (or could have been a Friend previously, but not since being Blocked). So either the Entity name has to change to reflect that (no change to the two Relations) xor Blocked has to be a separate Entity. Two separate meanings for the second Relation leads to complexity, therefore I would go with the latter.

    With the former, we have additional statuses:

    • Blocked
      .
    • Then the Verb Phrases need change (and I will include the RoleName for clarity), and one of them has a alternate meaning. .
    • (It will be much more clear at the Attribute level Model, that's why we model in pictures, not words; so I have included it.)

    Q.2) A person does not have to be a User. They can exist only as a BandMember. Is that what you are asking?

    • No. Why do we need to differentiate Person and User ? What are the separate actions or attributes ? Thus far, I see Person and User as the same Entity; Person is an User with no activity.

    • This is the last item, holding us back from dealing with the core Subject Area.

    M.3) I need to read up more on Constraint Checking to make sure I am understanding things.

    • Don't worry about that now; I was giving you reason to keep it simple (the non-compliant SQL databases appear to simplify things but actually they make it more complex). MySQL has none of those capabilities, so you can eliminate consideration of the platform, and just model the Cardinality meaningfully.

    M.4) Depends if you envision OrderPurchase in the future. Can you expand as to what you mean here?

    • In the context of the Model. You provide the structures to make SalesOrders (of Items). Therefore Item, Order and OrderItem.

    • But if you provided the structures to track PurchaseOrders as well (to purchase Items as well as office supplies, rent, whatever), then you need to differentiate Sales Orders and Purchase Orders. Therefore:

    • Item

    • OrderSale and OrderSaleItem

    • OrderPurchase and OrderPurchaseItem

    Version 1.1

    U.2) Event Progressed

    • EventDate looks good. I would define the Relation as Event Was Perfromed On EvenDate.

    • Whereas ItemGenre is perfect, Event::Venue Needs work. This is a mistake you make consistently, so an explanation is called for.

    • You have modelled Venue correctly, it is Independent and does exist outside the context of Event. But Event May Be [Held] At zero-to-many [Independent] Venues is not possible.

    • Events are held at many Venues, and Venues host many Events. If that was all, since this is the Logical Level, you can draw a many-to-many Relation, and you are done. At the Physical level, that Relation is resolved by implementing an Associative Table, of which the PK is the two parent PKs, and there is no data. (Enemy is a good example.)

    • But if there is data (eg. you need to track the date or number of attendees or whatever), then it is not an Associative Table, it is another Entity. A Thing that Takes Place between Event and Venue.

    • EventDate is a good candidate. We already have that, and the date. Just add Venue and stir. I would call the Thing that Takes Place between Event and Venue a Performance.

    • Likewise, EventAddress has progressed but is not complete.

    • Do Events have Addresses or Venues have Addresses ? (model it, no need for words)

    • If Venue: do you need all the historic Addresses for the Venue (like Party), or just the current one (like Order) ?

    M.5) SubGenre. Can you explain why SubGenre is (a) Independent and (b) the Relation is Non-Identifying.

    M.6) Item Is zero-to-many Favourites. Therefore: Item Is a Favourite of zero-to-many Users. Likewise, Each User Chooses zero-to-many Favourites. Therefore Each User Chooses zero-to-many Favourite Items.

    V1.2 and Response

    Great Progress.

    U.2) Event Further Progressed

    Going by your Edit as well as the new Requirements, some yes and some no. All the other Subject Areas of the Data Model are pretty much complete (for Logical), this one area is confused, not nearly as resolved. Partly because of the added Requirements (no complaint, that happens in real life; it is about how you handle it).

    The main point I will make here is that the Data Model should always model the real world, as opposed to only the business Requirement. That (a) insulates the DM from the effect of change and (b) provides a solid platform for added Requirements. That does not mean you have to model the whole real world, but the parts of it that you do model must reflect reality and not be squished up to fill just the Requirement.

    Second, there is lack of clarity about the distinctions between Event, Band-Event, Performance, etc. Right now an Event is a Party-Band-Item-Event. That's fine, but it does not work for the new style Event per Requirement.

    Third, you have a good handle on Address re Party and Order, but not re Venue.

    • Since you are accepting the Standard-compliant model and therefore the treatment, Address is a Reference table.

    • It is Independent (square corners)

    • Actually, you can place Address and everything above it on page one; making this part of the model page two, and have Address only on this page.

    • Correctly modelled: A Party has a history of Addresses. They must have at least one current { IsBilling | IsShipping | IsPhysical } Address, based on whatever activity is being executed.

    • Correctly modelled: An Order has one IsBilling Address (if you need IsShipping, you need to add a separate Relation).

    • Address is not a child of Venue (also Independent, correct). I do not think a Venue is located in zero-to-many Addresses. (Maybe that is the old Cardinality-reversed bug, but I am not sure, due to the other confusion re Event and Venue.)

    • Actually Address::Order is suspicious. (Q.3) Do you want Order to reference any valid Address, or a specific address for the Party executing the Order ?

    • Back to Event. Accepting EventDate as declared. That's fine but then Reviews etc, apply to the generic concert and not the single concert which they performed on mushrooms. Go for V1.3.

    • Your terminology re Event, etc is consistent with the Requirement, etc. but it does not support the Requirement as stated.

    • So let us start using "Event" the way it is used in the real world, and model it that way. What we have been calling "Event", the Party-Band-Item, is actually a Performance. And not a generic one that is scheduled, but a single one at a specific Venue.

    • That is either what you meant with EventDate, or EventDate resolves into Performance.

    If you do not mind, I will avoid typing one thousand words, and give you a picture. Subject Area Example V1.2

    • Notice that the multiple Bands per Event is resolved.

    • And the Verb Phrases are straight from heaven. An Address hosted multiple Venues, each of which catered multiple Events, each of which is multiple Performances, each of which is one Party-Band-Item.

    U.3) Is it time to move the link between Item and Band up to Item and Party instead? With the current design I don't see a possibility to sell merchandise not tied to the band as you have brought up.

    • First, we need to use Relational terminology, not because I am a pedant, but because the real gurus say it really helps to make the transition to the Relational world.

    • Second, we cannot accomplish that by "moving the Relation".

    • You have to model non-Band merchandise: how you are going to sell it; track it; get paid for it. Whether you want Reviews and Responses, etc. I do not see what Party has to do with it, and right now we are selling Band-Items, not Party-Items. Consider the Referential Integrity issues.

    Version 1.2

    AR.1) After going through the exercise for FavoriteItem, I feel that Item to Review requires a many-to-many relationship so that is indicated. Necessary?

    • In V1.1, An Item had many Reviews, and a Review was about one Item. A Person generated many Reviews (one per Item). That is logical.

    • A Review is about many Items is not reasonable.

    • If anything, now that FavouriteItem/FavouriteBand is resolved, Review needs likewise resolution and distinction: do we need to differentiate BandReview from ItemReview; does a good/bad ItemReview indicate a good/bad BandReview or are they discrete ?

    • a Review (as it stands) cannot be about either a Band or an Item. That means two Foreign Keys, and one of the will Null, and Null FKs are not allowed. Item and Band are alreay differentiated, and that differentiation is mature.

    • ItemReviews can be summarised, etc, but that is a different story.

    U.7) That leaves us with a new issue to resolve. If a Review can be about a Band or Album or Song or Performance, how do we ensure that Referential Integrity. We do not need an AlbumReview to reference a SongReview, etc. Model it.

    R.5) The model currently provides Genre at the Item level, that means Album and Song (Merchandise can be disallowed via a CHECK Constraint). Not Band. That may be enough, given that (a) bands change over time, (b) that kind of classification at the Item level is more precise, and (c) Band Genre can be easily derived from their Albums or Songs.

    • If you need separate Band Genres, you need to add that.

    • What about Event Genre ? If you need it, I think it will be one Genre per Event.

    • Keep in mind that tables like Venue and Genre are serious search criteria in a major database. Vectors for analysis.

    • The Data Warehouse boys need to add this in as Dimensions to their Facts; in a properly modelled Database, they already exist as Dimensions to Facts. Show me all the Venues with "Folk Music" Events scheduled that attracted more than 10,000 People is dead easy.
      .

    • Discussion Point. Not saying the above is incorrect. What I have found in both Databases and iTunes is, precision counts. Why have laissez faire Genre::Several things when you can have Genre ::Specific Thing. If you had Genre::Song only, and Song has one Genre only, then Album and Band are precise roll-ups. The way we have it now, it depends on the music knowledge of the data entry person, and Genre::Thing is many, so it is loose. Genre::Song is tight.

    R.6) members can show that they will be attending the Event is not modelled. Also clarify interest vs booking vs attendance.

    R.8) Is not modelled.

    M.3) The issue is closed, but the Verb Phrase remains unchanged.

    M.7) Logical Model vis-a-vis Associative tables. Now that that issue is closed, remove any Associative tables for the Logical model; any remaining tables (between two parents) will contain data. That means, go through all the Dependent tables and remove any that do not have data. Thus V1.3 should be less cluttered.

    M.8) Item is OrderItem.

    M.9) Now that Party-Person-User is resolved. An Exclusive Subtype structure requires a Discriminator, and the Constrainst will be used to enforce Integrity. Where there are many, PartyType is the way to go. But for just two, a column IsBand or IsPerson is adequate.

    M.10) You have corrected the cardinality-reversed bug, but some Verb Phrases are still going the wrong way.

    27 Jan 11

    Actually, I think a lot of these issues would be clearer if we move into the Logical Key/Attribute level (rather than just Entity Relation level). And it is high time we did. For example:

    Q.3) Order:Address is suspicious. The constraint is not quite correct because that would allow the order to have any Address, not an Address that is specific to the Party executing the order.

    But since you are MySQL, which has no Referential Integrity, you may not be aware of how it is done in real SQL, so I will provide the FK Definitions, which happen to be RI Constraints as well. It is kind of unfair to expect you to understand my terse statements, which are based in the RM, Normalisation and supported by SQL, when you do not have SQL.

    • In order for the two constraints to be true, since Party must be the same in each Constraint (there is only one Order.PartyId), only the subset of PartyAddress which belongs to PartyId, will be allowed.

    Address Qualification Example

    Continued in Part II ...