Search code examples
pythondatabasesqlitedatabase-designsqlalchemy

SQLAlchemy - Relationship with "indirect" Foreign Key


Context

I'm using SQLAlchemy (v2) to set up a database for an application, and I'm trying to model some relationships on my data. The relationships look similar to this diagram.

ER-Diagram

So I have these hierarchical many to one relationships, where Clusters belong to SubSubtypes, which in turn belong to a Subtype. Samples are similar, but they might only have a Subtype and not a SubSubtype.

Problem

Now I have the problem, that in my cluster table, I only want one reference to the subtype_id (preferably on subsubtype.subtype_id), and similarly in my Sample table I only want to reference subtype_id once.

This should be no problem, when specifying an alternate join condition or a custom FK-relationship, which I've tried.

I have some code (can be found close to the bottom), that does that, but when I run it, it successfully creates the database schema, but fails on inserting the objects. I get a NoForeignKeysError, and SQLAlchemy tells me that

there are no foreign keys linking [the subtype and cluster] tables.

and to

Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.

And while indeed, there is no direct connection, I specified a primaryjoin expression and the foreign_keys to use (in various combinations, with and without).

Also, when I subsequently visualize the database schema in the generated file, I get the following Diagram, which seems to suggest, that what I want should work.

DataGrip Diagram

This actually seems like a bug with SQLAlchemy to me (bad error message at the very least), but I've learned that things like these are usually user errors, so I'm coming here for second opinions first.

I've put the full error message (excluding the generated SQL) at the end, since details tags don't seem to work on SO.

I also tried to use the hybrid_property decorator for subtype (and/or subsubtype), which would work for Cluster, but afaics not for Sample, because I still don't see how I could get at the SubSubtypes without the

Question

How can I model these relationships in SQLAlchemy without duplicating the references, and preferably using a declarative style.

Changing the schema would be okay iff it is necessary to achieve the goal.

Similar Questions

I've had a look at these other questions, and while somewhat similar, they don't really help me here.

I haven't found much more that seems relevant, but would be happy about someone pointing me to an applicable solution.

Code

Here's a (somewhat) minimal reproducible example:

from pathlib import Path
from typing import List, Optional

from sqlalchemy import ForeignKey, String, create_engine, event, Engine
from sqlalchemy.orm import Mapped, mapped_column, relationship, DeclarativeBase, Session


class Base(DeclarativeBase):
    pass


@event.listens_for(Engine, "connect")
def sqlite_pragma_enforce_foreign_keys(dbapi_connection, _):
    """
    This is necessary to enforce foreign key constraints on SQLite.
    Cf. `SQLAlchemy docs <https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#foreign-key-support>`_.

    :param dbapi_connection: The database connection.
    :param _: connection_record?
    """
    cursor = dbapi_connection.cursor()
    cursor.execute("PRAGMA foreign_keys=ON")
    cursor.close()


class Sample(Base):
    __tablename__ = "sample"

    id: Mapped[str] = mapped_column(String(16), primary_key=True)
    """A unique identifier for the sample, aka. the "scount"."""

    sequence: Mapped[str] = mapped_column()
    """The actual genome data/consensus sequences. May only contain valid characters, cf. :meth:`_validate_sequence`."""

    subtype_id: Mapped[str] = mapped_column(ForeignKey("subtype.id"))
    """The ID of the sub- or sub-subtype of this sample."""

    subsubtype_id: Mapped[Optional[str]] = mapped_column(ForeignKey("subsubtype.id"))
    """The ID of the sub-subtype of this sample."""

    subtype: Mapped["Subtype"] = relationship(back_populates="samples")
    """The :class:`Subtype` of this sample."""

    subsubtype: Mapped[Optional["SubSubtype"]] = relationship(back_populates="samples")
    """The :class:`SubSubtype` of this sample."""


class Subtype(Base):
    __tablename__ = "subtype"

    id: Mapped[str] = mapped_column(String(3), primary_key=True)
    """The id of this subtype."""

    subsubtypes: Mapped[List["SubSubtype"]] = relationship()
    """A list of :class:`SubSubtype` clades under this subtype."""

    clusters: Mapped[List["Cluster"]] = relationship()
    """A list of :class:`Cluster` objects under this sub-subtype."""

    samples: Mapped[List[Sample]] = relationship()
    """All :class:`Sample` objects of this subtype."""


class SubSubtype(Base):
    __tablename__ = "subsubtype"

    subtype_id: Mapped[str] = mapped_column(ForeignKey("subtype.id"), primary_key=True)
    """Sub-subtypes belong to a :class:`Subtype`, which is their "parent", this is identified by the ``subtype_id``."""

    id: Mapped[str] = mapped_column(String(16), primary_key=True)
    """The sub-subtype specific part of the id."""

    subtype: Mapped[Subtype] = relationship(back_populates="subsubtypes")
    """Sub-subtypes have a :class:`Subtype` as parent."""

    clusters: Mapped[List["Cluster"]] = relationship()
    """A list of :class:`Cluster` objects under this sub-subtype."""

    samples: Mapped[List[Sample]] = relationship()
    """All :class:`Sample` objects of this subtype."""


class Cluster(Base):
    __tablename__ = "cluster"

    subtype_id: Mapped[str] = mapped_column(ForeignKey("subsubtype.subtype_id"), primary_key=True)
    """The ID of the sub- or sub-subtype of this cluster."""

    subsubtype_id: Mapped[str] = mapped_column(ForeignKey("subsubtype.id"), primary_key=True)
    """The ID of the sub-subtype of this cluster."""

    id: Mapped[str] = mapped_column(String(10), primary_key=True)
    """The cluster specific part of the name/id, e.g., in case of "A1_1", it would be "1"."""

    subtype: Mapped["Subtype"] = relationship(
        primaryjoin=subtype_id == Subtype.id,
        foreign_keys=[subtype_id],
        back_populates="clusters",
    )
    """The :class:`Subtype` of this cluster."""

    subsubtype: Mapped["SubSubtype"] = relationship(back_populates="clusters")
    """The :class:`SubSubtype` of this cluster."""


if __name__ == '__main__':
    engine = create_engine("sqlite:///:memory:", echo=True)
    Base.metadata.create_all(engine)

    subtype = Subtype(id="A")
    subsubtype = SubSubtype(subtype_id="A", id="1")
    cluster = Cluster(subtype_id="A", subsubtype_id="1", id="1")

    with Session(engine) as session:
        session.add_all([subtype, subsubtype, cluster])
        session.commit()

Exception

Full output
Traceback (most recent call last):
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2418, in _determine_joins
    self.primaryjoin = join_condition(
                       ^^^^^^^^^^^^^^^
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/sql/util.py", line 123, in join_condition
    return Join._join_condition(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/sql/selectable.py", line 1358, in _join_condition
    raise exc.NoForeignKeysError(
sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'subtype' and 'cluster'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/fynn/.config/JetBrains/PyCharm2023.2/scratches/scratch_6.py", line 138, in <module>
    subtype = Subtype(id="A")
              ^^^^^^^^^^^^^^^
  File "<string>", line 4, in __init__
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/state.py", line 561, in _initialize_instance
    manager.dispatch.init(self, args, kwargs)
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/event/attr.py", line 487, in __call__
    fn(*args, **kw)
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4391, in _event_on_init
    instrumenting_mapper._check_configure()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2386, in _check_configure
    _configure_registries({self.registry}, cascade=True)
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4199, in _configure_registries
    _do_configure_registries(registries, cascade)
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4240, in _do_configure_registries
    mapper._post_configure_properties()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2403, in _post_configure_properties
    prop.init()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/interfaces.py", line 579, in init
    self.do_init()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1636, in do_init
    self._setup_join_conditions()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1881, in _setup_join_conditions
    self._join_condition = jc = JoinCondition(
                                ^^^^^^^^^^^^^^
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2305, in __init__
    self._determine_joins()
  File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2439, in _determine_joins
    raise sa_exc.NoForeignKeysError(
sqlalchemy.exc.NoForeignKeysError: Could not determine join condition between parent/child tables on relationship Subtype.clusters - there are no foreign keys linking these tables.  Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.

Solution

  • Since SubSubType has a composite Primary Key, Cluster needs a composite Foreign Key. Instead of using ForeignKey() separately on each column, use ForeignKeyConstraint() in __table_args__, like this:

    from sqlalchemy import create_engine, ForeignKey, ForeignKeyConstraint, String
    from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
    
    engine = create_engine("sqlite://")
    
    
    class Base(DeclarativeBase):
        pass
    
    
    class SubType(Base):
        __tablename__ = "subtype"
    
        id: Mapped[str] = mapped_column(primary_key=True)
    
    
    class SubSubType(Base):
        __tablename__ = "subsubtype"
    
        subtype_id: Mapped[str] = mapped_column(
            ForeignKey("subtype.id"), primary_key=True
        )
        id: Mapped[str] = mapped_column(String(16), primary_key=True)
    
    
    class Cluster(Base):
        __tablename__ = "cluster"
    
        subtype_id: Mapped[str] = mapped_column(primary_key=True)
        subsubtype_id: Mapped[str] = mapped_column(primary_key=True)
        id: Mapped[str] = mapped_column(String(10), primary_key=True)
    
        __table_args__ = (
            ForeignKeyConstraint(
                [subtype_id, subsubtype_id], [SubSubType.subtype_id, SubSubType.id]
            ),
        )
    
    
    engine.echo = True
    Base.metadata.create_all(engine)
    """
    CREATE TABLE subtype (
        id VARCHAR NOT NULL, 
        PRIMARY KEY (id)
    )
    
    CREATE TABLE subsubtype (
        subtype_id VARCHAR NOT NULL, 
        id VARCHAR(16) NOT NULL, 
        PRIMARY KEY (subtype_id, id), 
        FOREIGN KEY(subtype_id) REFERENCES subtype (id)
    )
    
    CREATE TABLE cluster (
        subtype_id VARCHAR NOT NULL, 
        subsubtype_id VARCHAR NOT NULL, 
        id VARCHAR(10) NOT NULL, 
        PRIMARY KEY (subtype_id, subsubtype_id, id), 
        FOREIGN KEY(subtype_id, subsubtype_id) REFERENCES subsubtype (subtype_id, id)
    )
    """
    

    Here's an example for a similar situation from the SQLAlchemy docs.