Referential Integrity - how to configure SQLAlchemy?

Can someone help me get the big picture when it comes to configuring SQLAlchemy models, so referential integrity is ensured at all levels?

Going by the idea that referential integrity should be expressed and enforced by the DB, I have created a schema (currently in Postgresql) with all the constraints I think I need, thus giving me a database I trust will enforce referential integrity.

I then begin to build an app on top of this DB, using SQLAlchemy (0.7) in declarative mode.

Having searched and read a bit, I have learned that I can configure:

onupdate/ondelete rules on my Column() definitions.
cascade options on my relationship() definitions,
and that these seem to operate on the session level in SQLAlchemy.
passive_deletes and passive_updates options for my relationship() definitions.

And that all these options have defaults.

But I am left confused as to how much I actually need to do with my SQLAlchemy models, to make sure SQLAlchemy doesn't get out of sync with the DB and its constraints during a session.

What exactly am I achieving, if I configure 'onupdate' etc. on my Columns() definitions in SQLAlchemy?

And for the cascade and passive_delete/passive_update rules I can configure on a relationship(). What do I need here, and why?

Or to rephrase my question: To what extend will SQLAlchemy be aware of the constraints configured in the DB schema, and to what extend (and how) do I have to repeat them in my models?

And are the anything else I should be aware of? :)

Solution

SQLAlchemy doesn't fundamentally need to know anything about the constraints your database has. If your database has the constraints you want configured, you're essentially done - your application is disallowed by the database from doing anything it's not supposed to.

A key theme of SQLAlchemy is that it is really only doing what you tell it to. So if you were to try persisting an object, SubWidget(), which in the database needs to have a reference to a parent Widget(), at the point at which SQLAlchemy flushes the data (i.e. emits INSERT statements), the operation would fail with a constraint violation, emitted by the database, and the transaction is rolled back.

So assuming an FK on "subwidget" referencing "widget", your application needs to ensure the data is in the correct structure. There are two ways to do this; one is that you hand-maintain those columns which contain foreign key references, and ensure they have the appropriate value at the point of INSERT or UPDATE. The other is that you'd use relationship() to manage the foreign key attribute, and you'd instead ensure that the creation of a SubWidget() object is accompanied by the operation of associating it with a parent Widget() object which you've created and/or acquired separately.

Regarding cascades, it's a great idea, though not required, to have ON DELETE CASCADE on those foreign keys where it applies. On the SQLAlchemy side, when using relationship() you usually want to give the ORM a hint that the database will cascade deletes via the passive_deletes flag ( http://www.sqlalchemy.org/docs/orm/collections.html?highlight=passive_deletes#using-passive-deletes ), however this is usually a performance enhancement; SQLAlchemy otherwise ensures that all objects represented on the dependent side of the relationship() are loaded into memory, and handled appropriately, which means one of either setting the foreign key attribute to NULL (the default), or marking the dependent object for deletion (which occurs by setting "cascade" to "all, delete-orphan", see http://www.sqlalchemy.org/docs/orm/session.html#cascades).

ON UPDATE cascade is less common as natural primary keys are not a common practice these days, since they don't actually perform as well as plain integer primary keys and can also be cumbersome in other ways. SQLAlchemy does support these as well however and they will generally take care of themselves, as SQLA assumes by default that update cascades are in place when a PK mutation takes place, see http://www.sqlalchemy.org/docs/orm/relationships.html#mutable-primary-keys-update-cascades for a detailed description of that.

Perhaps this is all easier to grok with a little experimentation, the basic idea is that SQLAlchemy is only emitting the SQL you're telling it to, even though many of its SQL behaviors are automated once configured up front. relationship() should be configured with details on how you'd like it to behave when data is persisted, modified, or deleted with respect to the constraints present in the database.