Search code examples
sqljoinlookupsurrogate-keynatural-key

Lookup Table -- Natural or Surrogate key as primary key?


I have a table for recording license usage. Each license usage needs to be associated with a user and a host machine. The table definition looks like this.

create table if not exists  per_user_fact
(
    per_user_fact_id        int unsigned        not null    auto_increment,
    time_of_day             char(16)            not null,
    license_served_id       smallint unsigned   not null,
    license_hours           numeric(10,2)       not null,
    role_name               varchar(64)         null,
    user                    varchar(128)        not null,
    host                    varchar(128)        not null,
    primary key (per_user_fact_id),
    foreign key (license_served_id) references served_license(served_license_id),
    foreign key (user, host) references user_host(username, hostname)
);

I want to normalize this table so that the user/host values, which are repetitive, are moved to a new table like this.

create table if not exists  user_host
(
    username                varchar(64)         not null,
    hostname                varchar(128)        not null,
    primary key (username, hostname)
);

For the user_host table, what kind of primary key should I pick up - natural or surrogate? I can think of the following governing factors.

  1. If the primary key is natural, that is a composite of user and host names, the parent table per_user_fact will not need additional joins to find out user and host names.
  2. If the primary key is natural, there will be waste of storage as the user and host names values will be duplicated in both the tables.
  3. If the primary key is surrogate, the additional join will be needed for the parent table to get the values for user and host names.
  4. If the primary key is surrogate, the indexing on the user_host table will be faster.

Please advise.


Solution

  • I am a big fan of using surrogate primary keys, even in this case. The cost of the additional join is negligible when you are joining to a clustered primary key.

    In addition, assuming that username and hostname are (together) longer than four or so characters, the surrogate key saves space. In fact, you might find that a surrogate key results in faster queries because the data in per_user_fact is smaller. Smaller tables occupy fewer data pages, resulting in fewer I/Os.

    Another advantage of a surrogate key is the opportunity to change user names and host names without modifying any other tables. If you are using data fields for joins, then modifying the values requires updating multiple tables -- a more cumbersome operation.

    I also like surrogate identity/serial/auto increment keys because they also capture insertion order in the table. Of course, there are other methods (my tables typically have a CreatedAt column which defaults to the insertion time). But, the surrogate key can also play that role.

    These reasons do not constitute a "right" answer to the question. There are valid reasons for not using surrogates. For me, though, almost all my tables have such primary keys.