This question is about relational databases like postgresql or oracle. The following is a toy example of my problem. Say I have five tables
Client (id, name)
1 Joe
2 Ted
Factory (id, name, salary)
1 BMW 20
2 Porsche 30
Farm (id, name, salary)
1 Wineyard 10
2 Cattle farm 5
Occupation
1 Farmer
2 Worker
Client_Occupation (client-id, occupation-id, dependent-id)
1 (Joe) 1 (Farmer) 1 (Wineyard)
2 (Ted) 2 (Worker) 2 (Porsche)
To find out how much Joe earns, the sql needs to use data from either the table farms (if the occupation id is 1) or factories (if the occupation id is 2). This is creating very convoluted queries and makes the processing code more complex than it should be.
Is there a better way to structure this data if I do not want to merge the factories and farms tables?
In other words: the table client_ocupation has a conditional relation to either Farm or Factory. Is there a better way to represent this information?
Client
^
|
Factory <- Client_Occupation -> Farm
|
v
Occupation
This comes up often when modeling OO hierarchies, and is a table inheritance pattern. Generally, I prefer Concrete Table Inheritance over Single Table Inheritance, but they both have their use cases.
Single Table Inheritance is really easy - just take your Farm
and Factory
tables and merge them together for a superset. Done and done. Unfortunately, those niceties things like constraints become difficult, and you'll find yourself writing a lot of CASE
expressions on the discriminator column.
Concrete Table Inheritance takes a little time to grok, but is actually pretty simple as well. You design a base table w/common attributes and a discriminator column, and then "sub" tables w/specific attributes for each level of the hierarchy. The sub tables are linked back to the base table by Id and Discriminator - which provides some query optimizer gains and prevents a single entity from being multiple types.
For your example problem, you really don't have any specific attributes on the leaves - so this will look a little odd, but the pattern still works (note that I'm renaming "dependent-id" to "location-id" to make it more understandable):
Occupation (id, name)
1 Farmer
2 Worker
# the base table for all locations w/common attributes
Location (id, type, name, salary)
# unique constraint on discriminator column, and another on the combination of id and type so that we can reference in a foreign key
PK(id), UC(type), UC(id, type)
1 Farm Wineyard 10
2 Farm Cattle farm 5
3 Factory BMW 20
4 Factory Porsche 30
Client_Occupation (client-id, occupation-id, location-id)
FK location-id => location.id
1 (Joe) 1 (Farmer) 3 (Wineyard)
2 (Ted) 2 (Worker) 1 (Porsche)
Farm
# carry the discriminator column and check constraint it; any Farm specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Farm'
Factory (id, type)
1 Farm
2 Farm
Factory
# carry the discriminator column and check constraint it; any Factory specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Factory'
Factory (id, type)
3 Factory
4 Factory
You can now easily get salary for everything applicable:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
And, if you want to get specific Farm
columns, or limit it to Farm
locations:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
JOIN Farm F ON CO.id = F.id and CO.type = F.type // technically, joining on type is unnecessary, but I prefer to include it anyway
And, if you want to emulate the Single Table inheritance model:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
LEFT OUTER JOIN Farm F ON CO.id = F.id and CO.type = F.type
LEFT OUTER JOIN Factory T ON CO.id = T.id and CO.type = T.type
On occasion, it's difficult to refactor an existing table design into this pattern. In that case, a view for your base table provides a lot of the same querying benefits:
CREATE VIEW Location AS
SELECT id, 'Farm' as type, name, salary FROM Farm
UNION ALL
SELECT id, 'Factory' as type, name, salary FROM Factory
At that point, you will (probably) have conflicting ids (eg., a FarmId = 1 and FactoryId = 1) so you'll need to be diligent about including the type
column in any joins.