Search code examples
sqldatabasepostgresqldatabase-designdatabase-normalization

Many-to-many relationship without intersection table?


I am setting up a database where I'd like to have many-to-many relationships between some tables. There's no user interface for this database; we will be putting data into the tables using R scripts and retrieving it using Python scripts.

The entities involved are projects and cost forecasts. Multiple projects may use the same forecast. For each forecast, there are costs to develop a project in each of several future years. I need to be able to retrieve the cost forecast for each future year for each individual project.

I think the tables below would be a fairly standard way to represent these relationships. Note that "pk" means "primary key" and "fk" means "foreign key".

PROJECT
  name
  forecast_id (fk)

FORECAST
  forecast_id (pk)

COST
  forecast_id (fk)
  year
  cost

To retrieve the forecast for a particular project, I would just retrieve all the rows from COST that have a matching forecast_id. I don't need the FORECAST table for anything, except as a home for the forecast_id that establishes the many-to-many relationship between PROJECT and COST.

So my main question is, can I just drop the FORECAST table and have a direct many-to-many relationship between PROJECT and COST, using the forecast_id? I know this is physically possible, but many discussions use language along the lines that "many-to-many relationships aren't possible without a bridge table." But why would I want to add the bridge table, if I can do all my queries without it and it is one more table I would have to maintain?

Going further, many discussions of many-to-many relationships (including @mike-organek's comment below) suggest a structure similar to this:

PROJECT
  project_id (pk)
  name

PROJECT_COST
  project_id (fk)
  cost_id (fk)

COST
  cost_id (pk)
  year
  cost

While this seems like a commonly preferred approach, it suits my needs even less well. Now every time I add a new project, instead of just assigning the forecast_id corresponding to a particular forecast, I have to add a bunch of link records to the PROJECT_COST table, one for each future year. This will also require a lot of management, and allows potential creation of relationships I don't want (e.g., one project uses costs from one forecast for the first two years, then costs from a different forecast for the next two years).

So my second question is, is there anything preferable about the second approach over the first approach, or over my simplified approach (using just the PROJECT and COST tables)?

Update

There seems to be some confusion about what I'm asking here. So I've revised the question significantly to try to make it clearer. Note that I renamed cost_group to forecast as part of this.


Solution

  • The second approach (with the project_cost table containing two foreign keys) is the correct way to model a many-to-many relationship.

    But your idea with the shared forecast_id (with or without forecast table) exhibits that you are not thinking of a many-to-many relationship in the ordinary sense: if one project is associated with a certain set of costs, all other projects must either be associated with the same or a disjoint set of costs.

    If that is what you want, I see no problem with removing the forecast table. There is no referential integrity you are losing that way.

    If you have additional requirements, for example that there has to be at least a cost and a project for each existing forecast_id, things may change. That could be guaranteed with foreign keys from the forecast table, but not without that table.