postgresql email database-design relational-database datamodel

How do I "link" two tables when the primary key of table_a could exist in none, one or many columns in table_b?

How do I link two tables, People and Messages, when there is no single column in the Messages table that corresponds to the primary key of the People table?

I have two tables: one containing email messages called Messages and one of people called People.

People

person_id	email_address
001	`billy@gmail.com`
002	`bobby@gmail.com`
003	`gregg@gmail.com`
004	`janet@gmail.com`
005	`boss1@gmail.com`

Messages

msg_id	from_addr	to_addr	cc_addr	bcc_addr
001	`billy@gmail.com`	`gregg@gmail.com`, `david@gmail.com`	`boss1@gmail.com`	NULL
002	`bobby@gmail.com`	`gregg@gmail.com`	NULL	NULL
003	`janet@gmail.com`	`boss1@gmail.com`	`bobby@gmail.com`	`gregg@gmail.com`

I will be programmatically passed a person_id (say 005 for boss1@gmail.com) and want to be able to query the Messages dataset to get all "involved" emails--where a boss1@gmail.com was sent, received, was cc'd or bcc'd in the email message. So with the dataset above, the query ought to return the rows with msg_id 001 and 003.

I don't know to map the relation between the People and Messages tables.

Say the primary key of of the People table is person_id. There is no single column that I can link that to in the Messages table to act as a foreign key.

I cannot determine how to represent this data correctly.

Am I doing this wrong? What is a better way to represent these two tables with more tables structured differently?

After normalising to 1NF:

Messages_FNF

msg_id	from_addr	to_addr	cc_addr	bcc_addr
001	`billy@gmail.com`	`gregg@gmail.com`	`boss1@gmail.com`	NULL
001	`billy@gmail.com`	`david@gmail.com`	`boss1@gmail.com`	NULL
002	`bobby@gmail.com`	`gregg@gmail.com`	NULL	NULL
003	`janet@gmail.com`	`boss1@gmail.com`	`bobby@gmail.com`	`gregg@gmail.com`

However, the primary key (that uniquely identifies each row in the table) for Messages_FNF would have to be a composite key combining msg_id, from_addr, to_addr, cc_addr, bcc_addr.

So how do I link the People and Messages tables together using Primary and Foreign keys?

Or must I further normalise or partition these tables such that the People table and Messages_FNF have a meaningful matching field - allowing for easy joins? If so, how?

I've read that the cross join lets you join tables that do not have a common field but I worry that it will result in a VERY large table with unnecessary rows and be computationally expensive.

Is linking the tables with a PK->FK the right way of structuring the data in this problem?

Solution

To put OP out of their misunderstanding on the main point:

So how do I link the People and Messages tables together using Primary and Foreign keys? ...

I've read that the cross join lets you join tables that do not have a common field ...

To "link tables together" does not need going via keys -- not necessarily any key in any table. The most common linkages in a 1:many join are from Foreign key in the many side to Primary key in the 1 side. That's why those relationships carry those names.

That Primary, Foreign key business is a result of 'normalising' the data model, aimed at exactly avoiding 'update anomalies' and making joining straightforward.

Joins going via non-key fields are sometimes call 'cross joins' to distinguish them -- but they're really all just joins, and you can use sql keyword join ... on ....

OP has now revised the q to show Messages_FNF. To repeat: no need to go via keys, so you could join People.email_address to each/any of the three fields holding email addresses.

Whilst this design is in 1NF (under a very generous interpretation), it is hardly a solution with all that content repeated, including repeating a load of useless NULLs. You'll suffer update anomalies (see the wikipage); you'll also indeed get a blowup in numbers of rows.

You need to vertically split off the columns that have repeating (or nullable) fields. Retain from_addr in the Messages table (because it must have exactly one from address). Create three extra tables each with a compound key of {message_id, to_addr} etc, in which message_id is a Foreign key to (REFERENCES) Messages.

Now your sql can't use join ... on ... because you're looking for a match in any of the four tables, not all. You'll need a disjunctive where ... = ... or ... = ... or ... query. Exercise left for the reader: that's a straightforward piece of sql.

Addit: in response to comment

Each of these three table will have rows upon rows of email addresses that already exist in the People table.

Do they already exist in People table? You didn't say that in the original q. I get messages from all sorts of people I've never emailed before; I certainly don't know who else might get bcc'd on messages I receive.

So if you have a business rule that anybody's email address appearing anywhere on any message is to automatically get an entry in the People table, yes your auxiliary tables could carry msg_id, person_id. You'll still have "rows upon rows" of person_id.

You'll still have to query all those Messages auxiliary datasets to search for person_id; and it still won't be a whole Primary key of any table. (You seem to be worried about efficiency/effort: I'd guess not much saving, on modern DBMSs.)