Search code examples
sql-serverforeign-keysrdbmssql-server-2017check-constraints

Ensuring that two column values are related in SQL Server


I'm using Microsoft SQL Server 2017 and was curious about how to constrain a specific relationship. I'm having a bit of trouble articulating so I'd prefer to share through an example.

Consider the following hypothetical database.


Customers
+---------------+
|  Id  |  Name  |
+---------------+
|  1   |  Sam   |
|  2   |  Jane  |
+---------------+
Addresses
+----------------------------------------+
|  Id  |  CustomerId  |  Address         |
+----------------------------------------+
|  1   |  1           |  105 Easy St     |
|  2   |  1           |  9 Gale Blvd     |
|  3   |  2           |  717 Fourth Ave  |
+------+--------------+------------------+
Orders
+-----------------------------------+
|  Id  |  CustomerId  |  AddressId  |
+-----------------------------------+
|  1   |  1           |  1          |
|  2   |  2           |  3          |
|  3   |  1           |  3          |  <--- Invalid Customer/Address Pair
+-----------------------------------+

Notice that the final Order links a customer to an address that isn't theirs. I'm looking for a way to prevent this.

(You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.)

From the related reading I was able to find, it seems that one method may be to enable a CHECK constraint targeting a User-Defined Function. This User-Defined Function would be something like the following:

WHERE EXISTS (SELECT 1 FROM Addresses WHERE Id = Order.AddressId AND CustomerId = Order.CustomerId)

While I imagine this would work, given the somewhat "generality" of the articles I was able to find, I don't feel entirely confident that this is my best option.

An alternative might be to remove the CustomerId column from the Addresses table entirely, and instead add another table with Id, CustomerId, AddressId. The Order would then reference this Id instead. Again, I don't love the idea of having to channel through an auxiliary table to get a Customer or Address.

Is there a cleaner way to do this? Or am I simply going about this all wrong?


Solution

  • Good question, however at the root it seems you are struggling with creating a foreign key constraint to something that is not a foreign key:

    Orders.CustomerId -> Addresses.CustomerId

    There is no simple built-in way to do this because it is normally not done. In ideal RDBMS practices you should strive to encapsulate data of specific types in their own tables only. In other words, try to avoid redundant data.

    In the example case above the address ownership is redundant in both the address table and the orders table, because of this it is requiring additional checks to keep them synchronized. This can easily get out of hand with bigger datasets.

    You mentioned:

    However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.

    But that is why a relational database is relational. It does this so that distinct data can be kept distinct and referenced with relative IDs.

    I think the best solution would be to simply drop this requirement.

    In other words, just go with:

    Customers
    +---------------+
    |  Id  |  Name  |
    +---------------+
    |  1   |  Sam   |
    |  2   |  Jane  |
    +---------------+
    
    Addresses
    +----------------------------------------+
    |  Id  |  CustomerId  |  Address         |
    +----------------------------------------+
    |  1   |  1           |  105 Easy St     |
    |  2   |  1           |  9 Gale Blvd     |
    |  3   |  2           |  717 Fourth Ave  |
    +------+--------------+------------------+
    
    Orders
    +--------------------+
    |  Id  |  AddressId  |
    +--------------------+
    |  1   |  1          |
    |  2   |  3          |
    |  3   |  3          |  <--- Valid Order/Address Pair
    +--------------------+
    

    With that said, to accomplish your purpose exactly, you do have views available for this kind of thing:

    create view CustomerOrders
    as
    
    select  o.Id OrderId,
            a.CustomerId,
            o.AddressId
    from    Orders
    join    Addresses a on a.Id = o.AddressId
    

    I know this is a pretty trivial use-case for a view but I wanted to put in a plug for it because they are often neglected and come in handy with organizing big data sets. Using WITH SCHEMABINDING they can also be indexed for performance.