Search code examples
pythonsqlalchemysqlmodel

Imposing uniqueness constraint of column values between multiple tables in SQLModel


Apologies if this question has been answered elsewhere: I've had no luck when searching. I'm also a SQL newbie, so I hope the answer to my question isn't too elementary.

I'm creating a database to hold experimental parameters for our lab and I'm doing so using SQLModel. In a bare-bones example of my question, I have two tables, WaveformA and WaveformB:

from sqlmodel import Field, SQLModel

#from sqlmodel import  Column, Relationship, CheckConstraint, Integer, UniqueConstraint
# from sqlalchemy.ext.hybrid import hybrid_property

class WaveformA(SQLModel, table=True):
    __tablename__ = "waveforms_a"

    id: int = Field(
        primary_key=True,
    )
    name: str = Field(
        unique=True,
        description="Unique name to identify a waveform of type A"
    )
    # other columns particular to waveforms of type A, that may not be present in Waveform B

    def __repr__(self):
        return f"Waveform A with id {self.id} and name {self.name}"


class WaveformB(SQLModel, table=True):
    __tablename__ = "waveforms_b"

    id: int = Field(
        primary_key=True,
    )
    name: str = Field(
        unique=True,
        description="Unique name to identify a waveform of type B"
    )
    # other columns particular to waveforms of type B, that may not be present in Waveform A

    def __repr__(self):
        return f"Waveform B with id {self.id} and name {self.name}"

I'd like to impose a constraint on both WaveformA and WaveformB together, at the table level, to ensure uniqueness between the union of all name values appearing in both of these tables.

The reason I'd like to do this, is because all rows in WaveformA are fed into a mathematical function on the backend, and all rows in WaveformB are fed into a different mathematical function on the backend, when an experiment is queued on our hardware. Apart from the name column of each table, WaveformA and WaveformB generally have different columns because their functions are different (e.g. WaveformA may have a column "standard deviation" and not "gain", whereas WaveformB has "gain" but not "standard deviation"), so it makes sense to split the waveforms into two tables. However, after reading all of the rows of both tables, inputting the column values into the appropriate function and grabbing the outputs of each function for each table, I need to combine the results for both tables into a single Python dictionary, with the name column values as keys and the outputs from these functions as the corresponding values. This is the input Python dictionary the hardware requires, and obviously the keys need to all be unique, hence the need for uniqueness for the name column between multiple tables.

It is trivial to impose uniqueness for a column in a single table (the "unique=True" parameter in the Field instances). I've also tried looking into UniqueConstraint and adding one to _table_args_ of each waveform, but this only appears useful in creating uniqueness constraints between multiple columns in the same table, not columns between tables.

Python itself would return an error in my compilation function that creates the aforementioned dictionary, so this error wouldn't be missed. But I'd like to impose such a constraint so that any user in our lab doesn't inadvertently duplicate the name given to an existing waveform row (to ensure the integrity of our database).

I'm not sure if such constraints are possible even in SQL itself, so I'd appreciate any help!


Solution

  • Some other options that might work are:

    1. collapse the tables and use a type column to differentiate, leaving unused columns as null, a check constraint could be used with the type ie. CheckConstraint("(type == 'b' AND gain IS NOT NULL) OR (type == 'a' AND standard_deviation IS NOT NULL)")
    2. Keep two tables but move name to third table and make that unique and have two foreign keys with a check constraint enforcing integrity of two foreign keys pointing to both waveforms_b and waveforms_a. ie. CheckConstraint("(type == 'b' AND waveforms_b_id IS NOT NULL AND waveforms_a_id IS NULL) OR (type == 'a' AND waveforms_a_id IS NOT NULL AND waveforms_b_id IS NULL")

    I think using a single table seems like it would be the easiest but I guess I don't fully understand the use of name or how many columns are actually involved. This seems like a version of inheritance, which I think in most cases using a single table with a type column is the best for most cases.

    Docs for inheritance