Search code examples
mysqldatabase-designtreeparent-childentity-relationship

Which alternative to choose for a database redesign (sports stats, competitions, geographic scopes)?


I'm extending a database, which is about sports stats. The different leagues, or more precisely "competitions", can be represented by a tree structure analogous to certain geographic areas (scope/range of participating teams):

 1. Continent Competition    (continent)
                                  ↑
 2. Country Competition       (country)
                                  ↑
 3. Region Competition        (region)
                                  ↑
 4. State Competition          (state)
                                  ↑
 5. District Competition     (district)

(The arrows represent the FKs to the parent geo area entity (tree structure).)

Here' the design visually:

Valid XHTML
(source: kawoolutions.com)

I created a hierarchy of "geo areas" basically just to implement the scoped competitions hierarchy. Each competition entity has a reference to a geo area, e.g. each state knows its (sub-country) region (parent_id).

Each geo area - competition combination encapsulates three pieces of information: competition type (league, cup, playoffs, ...), geo area type (the scope of participating teams!), and geo area name.

Here are the competitions I wanted to insert into the DB:

INSERT INTO Competitions (geo_area_id, type, label) VALUES ( 88, 'league', 'Deutsche Meisterschaft');         -- 'Country Championships Germany'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (248, 'league', 'Regionalmeisterschaft Nord');     -- 'Region Championships North'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (249, 'league', 'Regionalmeisterschaft West');     -- 'Region Championships West'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (250, 'league', 'Regionalmeisterschaft Sued');     -- 'Region Championships South'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (251, 'league', 'Regionalmeisterschaft Ost');      -- 'Region Championships East'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (258, 'league', 'Landesmeisterschaft Hessen');     -- 'State Championships Hesse'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (268, 'league', 'Bezirksmeisterschaft Darmstadt'); -- 'District Championships Darmstadt'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (269, 'league', 'Bezirksmeisterschaft Frankfurt'); -- 'District Championships Frankfurt'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (270, 'league', 'Bezirksmeisterschaft Giessen');   -- 'District Championships Giessen'
INSERT INTO Competitions (geo_area_id, type, label) VALUES (271, 'league', 'Bezirksmeisterschaft Kassel');    -- 'District Championships Kassel'

INSERT INTO Competitions (geo_area_id, type, label) VALUES ( 88, 'cup', 'DBB Pokal');              -- 'Country Cup Germany'
-- INSERT INTO Competitions (geo_area_id, type, label) VALUES (250, 'cup', 'Regionenpokal Süd');      -- 'Region Cup South' => DOESN'T EXIST IN REALITY!
INSERT INTO Competitions (geo_area_id, type, label) VALUES (258, 'cup', 'Landespokal Hessen');     -- 'State Cup Hessen'  => PROBLEM HERE! parent should be country cup Germany, but it points to Region South due to the geo areas nature
INSERT INTO Competitions (geo_area_id, type, label) VALUES (268, 'cup', 'Bezirkspokal Darmstadt'); -- 'District Cup Darmstadt'

For the regular-season competitions (leagues) this works pretty well (top entities), however, on the second to last insert into the Competitions table I realized a problem that I hadn't foreseen:

There are competitions for which the "parent competition" - as determined by the geo area parent - doesn't exist. Example: the qualifying teams of the German cup competition for each state go directly to country level, because there is no such thing as a "German regional cup" competition.

By their logic, looking at the geo areas isolated from the competitions, the geo areas themselves are represented correctly I'd say. It's just that competitions sometimes seem to "skip" geo areas. Nothing unsolvable of course, but how do I best adapt to this situation now?

The alternatives that come to my mind:

  1. Enter a dummy cup competition for each region. The Competitions table then needs a BOOLEAN is_dummy column or similar. At runtime this flag had to be checked and if it's a dummy that geo area's parent is returned (possible recursively until a geoa area is found or NULL is returned).
  2. Add another optional parent_geo_area_id to Competitions. At runtime this field is checked: if it has non-NULL value, use that one to return the parent geo area (parent override field), if not use the "normal" geo area reference's parent. The problem with this approach is that the Competitions table already has a foreign key to GeoAreas and that adding another would basically represent a "repeated column" - wouldn't it? Wouldn't this cause a conditional join later on? Is it possible?

Both add redundancy to a certain degree, but I'm not sure how to do it without as the "competitions skip problem" itself causes it.

Maybe other alternatives exist? If not, which alternative would you suggest and why?

Thanks!


Solution

  • It's just that competitions sometimes seem to "skip" geo areas. Nothing unsolvable of course, but how do I best adapt to this situation now?

    Model the way the real world works.

    Don't adapt a model that you already know doesn't work.