I'm using SQLAlchemy (v2) to set up a database for an application, and I'm trying to model some relationships on my data. The relationships look similar to this diagram.
So I have these hierarchical many to one relationships, where Cluster
s belong to SubSubtype
s, which in turn belong to a Subtype
.
Sample
s are similar, but they might only have a Subtype
and not a SubSubtype
.
Now I have the problem, that in my cluster
table, I only want one reference to the subtype_id
(preferably on subsubtype.subtype_id
), and similarly in my Sample
table I only want to reference subtype_id
once.
This should be no problem, when specifying an alternate join condition or a custom FK-relationship, which I've tried.
I have some code (can be found close to the bottom), that does that, but when I run it, it successfully creates the database schema, but fails on inserting the objects.
I get a NoForeignKeysError
, and SQLAlchemy tells me that
there are no foreign keys linking [the subtype and cluster] tables.
and to
Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.
And while indeed, there is no direct connection, I specified a primaryjoin
expression and the foreign_keys
to use (in various combinations, with and without).
Also, when I subsequently visualize the database schema in the generated file, I get the following Diagram, which seems to suggest, that what I want should work.
This actually seems like a bug with SQLAlchemy to me (bad error message at the very least), but I've learned that things like these are usually user errors, so I'm coming here for second opinions first.
I've put the full error message (excluding the generated SQL) at the end, since details tags don't seem to work on SO.
I also tried to use the hybrid_property
decorator for subtype
(and/or subsubtype
), which would work for Cluster
, but afaics not for Sample
, because I still don't see how I could get at the SubSubtype
s without the
How can I model these relationships in SQLAlchemy without duplicating the references, and preferably using a declarative style.
Changing the schema would be okay iff it is necessary to achieve the goal.
I've had a look at these other questions, and while somewhat similar, they don't really help me here.
I haven't found much more that seems relevant, but would be happy about someone pointing me to an applicable solution.
Here's a (somewhat) minimal reproducible example:
from pathlib import Path
from typing import List, Optional
from sqlalchemy import ForeignKey, String, create_engine, event, Engine
from sqlalchemy.orm import Mapped, mapped_column, relationship, DeclarativeBase, Session
class Base(DeclarativeBase):
pass
@event.listens_for(Engine, "connect")
def sqlite_pragma_enforce_foreign_keys(dbapi_connection, _):
"""
This is necessary to enforce foreign key constraints on SQLite.
Cf. `SQLAlchemy docs <https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#foreign-key-support>`_.
:param dbapi_connection: The database connection.
:param _: connection_record?
"""
cursor = dbapi_connection.cursor()
cursor.execute("PRAGMA foreign_keys=ON")
cursor.close()
class Sample(Base):
__tablename__ = "sample"
id: Mapped[str] = mapped_column(String(16), primary_key=True)
"""A unique identifier for the sample, aka. the "scount"."""
sequence: Mapped[str] = mapped_column()
"""The actual genome data/consensus sequences. May only contain valid characters, cf. :meth:`_validate_sequence`."""
subtype_id: Mapped[str] = mapped_column(ForeignKey("subtype.id"))
"""The ID of the sub- or sub-subtype of this sample."""
subsubtype_id: Mapped[Optional[str]] = mapped_column(ForeignKey("subsubtype.id"))
"""The ID of the sub-subtype of this sample."""
subtype: Mapped["Subtype"] = relationship(back_populates="samples")
"""The :class:`Subtype` of this sample."""
subsubtype: Mapped[Optional["SubSubtype"]] = relationship(back_populates="samples")
"""The :class:`SubSubtype` of this sample."""
class Subtype(Base):
__tablename__ = "subtype"
id: Mapped[str] = mapped_column(String(3), primary_key=True)
"""The id of this subtype."""
subsubtypes: Mapped[List["SubSubtype"]] = relationship()
"""A list of :class:`SubSubtype` clades under this subtype."""
clusters: Mapped[List["Cluster"]] = relationship()
"""A list of :class:`Cluster` objects under this sub-subtype."""
samples: Mapped[List[Sample]] = relationship()
"""All :class:`Sample` objects of this subtype."""
class SubSubtype(Base):
__tablename__ = "subsubtype"
subtype_id: Mapped[str] = mapped_column(ForeignKey("subtype.id"), primary_key=True)
"""Sub-subtypes belong to a :class:`Subtype`, which is their "parent", this is identified by the ``subtype_id``."""
id: Mapped[str] = mapped_column(String(16), primary_key=True)
"""The sub-subtype specific part of the id."""
subtype: Mapped[Subtype] = relationship(back_populates="subsubtypes")
"""Sub-subtypes have a :class:`Subtype` as parent."""
clusters: Mapped[List["Cluster"]] = relationship()
"""A list of :class:`Cluster` objects under this sub-subtype."""
samples: Mapped[List[Sample]] = relationship()
"""All :class:`Sample` objects of this subtype."""
class Cluster(Base):
__tablename__ = "cluster"
subtype_id: Mapped[str] = mapped_column(ForeignKey("subsubtype.subtype_id"), primary_key=True)
"""The ID of the sub- or sub-subtype of this cluster."""
subsubtype_id: Mapped[str] = mapped_column(ForeignKey("subsubtype.id"), primary_key=True)
"""The ID of the sub-subtype of this cluster."""
id: Mapped[str] = mapped_column(String(10), primary_key=True)
"""The cluster specific part of the name/id, e.g., in case of "A1_1", it would be "1"."""
subtype: Mapped["Subtype"] = relationship(
primaryjoin=subtype_id == Subtype.id,
foreign_keys=[subtype_id],
back_populates="clusters",
)
"""The :class:`Subtype` of this cluster."""
subsubtype: Mapped["SubSubtype"] = relationship(back_populates="clusters")
"""The :class:`SubSubtype` of this cluster."""
if __name__ == '__main__':
engine = create_engine("sqlite:///:memory:", echo=True)
Base.metadata.create_all(engine)
subtype = Subtype(id="A")
subsubtype = SubSubtype(subtype_id="A", id="1")
cluster = Cluster(subtype_id="A", subsubtype_id="1", id="1")
with Session(engine) as session:
session.add_all([subtype, subsubtype, cluster])
session.commit()
Traceback (most recent call last):
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2418, in _determine_joins
self.primaryjoin = join_condition(
^^^^^^^^^^^^^^^
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/sql/util.py", line 123, in join_condition
return Join._join_condition(
^^^^^^^^^^^^^^^^^^^^^
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/sql/selectable.py", line 1358, in _join_condition
raise exc.NoForeignKeysError(
sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'subtype' and 'cluster'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/fynn/.config/JetBrains/PyCharm2023.2/scratches/scratch_6.py", line 138, in <module>
subtype = Subtype(id="A")
^^^^^^^^^^^^^^^
File "<string>", line 4, in __init__
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/state.py", line 561, in _initialize_instance
manager.dispatch.init(self, args, kwargs)
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/event/attr.py", line 487, in __call__
fn(*args, **kw)
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4391, in _event_on_init
instrumenting_mapper._check_configure()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2386, in _check_configure
_configure_registries({self.registry}, cascade=True)
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4199, in _configure_registries
_do_configure_registries(registries, cascade)
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 4240, in _do_configure_registries
mapper._post_configure_properties()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/mapper.py", line 2403, in _post_configure_properties
prop.init()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/interfaces.py", line 579, in init
self.do_init()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1636, in do_init
self._setup_join_conditions()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 1881, in _setup_join_conditions
self._join_condition = jc = JoinCondition(
^^^^^^^^^^^^^^
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2305, in __init__
self._determine_joins()
File "/home/fynn/Desktop/HIV/HIV-Clustering/venv/lib64/python3.11/site-packages/sqlalchemy/orm/relationships.py", line 2439, in _determine_joins
raise sa_exc.NoForeignKeysError(
sqlalchemy.exc.NoForeignKeysError: Could not determine join condition between parent/child tables on relationship Subtype.clusters - there are no foreign keys linking these tables. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.
Since SubSubType
has a composite Primary Key, Cluster
needs a composite Foreign Key. Instead of using ForeignKey()
separately on each column, use ForeignKeyConstraint()
in __table_args__
, like this:
from sqlalchemy import create_engine, ForeignKey, ForeignKeyConstraint, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
engine = create_engine("sqlite://")
class Base(DeclarativeBase):
pass
class SubType(Base):
__tablename__ = "subtype"
id: Mapped[str] = mapped_column(primary_key=True)
class SubSubType(Base):
__tablename__ = "subsubtype"
subtype_id: Mapped[str] = mapped_column(
ForeignKey("subtype.id"), primary_key=True
)
id: Mapped[str] = mapped_column(String(16), primary_key=True)
class Cluster(Base):
__tablename__ = "cluster"
subtype_id: Mapped[str] = mapped_column(primary_key=True)
subsubtype_id: Mapped[str] = mapped_column(primary_key=True)
id: Mapped[str] = mapped_column(String(10), primary_key=True)
__table_args__ = (
ForeignKeyConstraint(
[subtype_id, subsubtype_id], [SubSubType.subtype_id, SubSubType.id]
),
)
engine.echo = True
Base.metadata.create_all(engine)
"""
CREATE TABLE subtype (
id VARCHAR NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE subsubtype (
subtype_id VARCHAR NOT NULL,
id VARCHAR(16) NOT NULL,
PRIMARY KEY (subtype_id, id),
FOREIGN KEY(subtype_id) REFERENCES subtype (id)
)
CREATE TABLE cluster (
subtype_id VARCHAR NOT NULL,
subsubtype_id VARCHAR NOT NULL,
id VARCHAR(10) NOT NULL,
PRIMARY KEY (subtype_id, subsubtype_id, id),
FOREIGN KEY(subtype_id, subsubtype_id) REFERENCES subsubtype (subtype_id, id)
)
"""
Here's an example for a similar situation from the SQLAlchemy docs.