How do I link two tables, People
and Messages
, when there is no single column in the Messages
table that corresponds to the primary key of the People
table?
I have two tables: one containing email messages called Messages
and one of people called People
.
People
person_id | email_address |
---|---|
001 | billy@gmail.com |
002 | bobby@gmail.com |
003 | gregg@gmail.com |
004 | janet@gmail.com |
005 | boss1@gmail.com |
Messages
msg_id | from_addr | to_addr | cc_addr | bcc_addr |
---|---|---|---|---|
001 | billy@gmail.com |
gregg@gmail.com , david@gmail.com |
boss1@gmail.com |
NULL |
002 | bobby@gmail.com |
gregg@gmail.com |
NULL | NULL |
003 | janet@gmail.com |
boss1@gmail.com |
bobby@gmail.com |
gregg@gmail.com |
I will be programmatically passed a person_id
(say 005
for boss1@gmail.com
) and want to be able to query the Messages
dataset to get all "involved" emails--where a boss1@gmail.com
was sent, received, was cc'd or bcc'd in the email message. So with the dataset above, the query ought to return the rows with msg_id
001
and 003
.
I don't know to map the relation between the People
and Messages
tables.
Say the primary key of of the People
table is person_id
. There is no single column that I can link that to in the Messages
table to act as a foreign key.
I cannot determine how to represent this data correctly.
Am I doing this wrong? What is a better way to represent these two tables with more tables structured differently?
After normalising to 1NF:
Messages_FNF
msg_id | from_addr | to_addr | cc_addr | bcc_addr |
---|---|---|---|---|
001 | billy@gmail.com |
gregg@gmail.com |
boss1@gmail.com |
NULL |
001 | billy@gmail.com |
david@gmail.com |
boss1@gmail.com |
NULL |
002 | bobby@gmail.com |
gregg@gmail.com |
NULL | NULL |
003 | janet@gmail.com |
boss1@gmail.com |
bobby@gmail.com |
gregg@gmail.com |
However, the primary key (that uniquely identifies each row in the table) for Messages_FNF
would have to be a composite key combining msg_id
, from_addr
, to_addr
, cc_addr
, bcc_addr
.
So how do I link the People
and Messages
tables together using Primary and Foreign keys?
Or must I further normalise or partition these tables such that the People
table and Messages_FNF
have a meaningful matching field - allowing for easy joins? If so, how?
I've read that the cross join lets you join tables that do not have a common field but I worry that it will result in a VERY large table with unnecessary rows and be computationally expensive.
Is linking the tables with a PK->FK the right way of structuring the data in this problem?
To put OP out of their misunderstanding on the main point:
So how do I link the People and Messages tables together using Primary and Foreign keys? ...
I've read that the cross join lets you join tables that do not have a common field ...
To "link tables together" does not need going via keys -- not necessarily any key in any table. The most common linkages in a 1:many join are from Foreign key in the many side to Primary key in the 1 side. That's why those relationships carry those names.
That Primary, Foreign key business is a result of 'normalising' the data model, aimed at exactly avoiding 'update anomalies' and making joining straightforward.
Joins going via non-key fields are sometimes call 'cross joins' to distinguish them -- but they're really all just joins, and you can use sql keyword join ... on ...
.
OP has now revised the q to show Messages_FNF
. To repeat: no need to go via keys, so you could join People.email_address
to each/any of the three fields holding email addresses.
Whilst this design is in 1NF (under a very generous interpretation), it is hardly a solution with all that content repeated, including repeating a load of useless NULL
s. You'll suffer update anomalies (see the wikipage); you'll also indeed get a blowup in numbers of rows.
You need to vertically split off the columns that have repeating (or nullable) fields. Retain from_addr
in the Messages
table (because it must have exactly one from address). Create three extra tables each with a compound key of {message_id, to_addr}
etc, in which message_id
is a Foreign key to (REFERENCES
) Messages
.
Now your sql can't use join ... on ...
because you're looking for a match in any of the four tables, not all. You'll need a disjunctive where ... = ... or ... = ... or ...
query. Exercise left for the reader: that's a straightforward piece of sql.
Addit: in response to comment
Each of these three table will have rows upon rows of email addresses that already exist in the People table.
Do they already exist in People
table? You didn't say that in the original q. I get messages from all sorts of people I've never emailed before; I certainly don't know who else might get bcc'd on messages I receive.
So if you have a business rule that anybody's email address appearing anywhere on any message is to automatically get an entry in the People
table, yes your auxiliary tables could carry msg_id, person_id
. You'll still have "rows upon rows" of person_id
.
You'll still have to query all those Messages
auxiliary datasets to search for person_id
; and it still won't be a whole Primary key of any table. (You seem to be worried about efficiency/effort: I'd guess not much saving, on modern DBMSs.)