Search code examples
sqlsqlalchemymany-to-manyparent-child

How to remove SQLAlchemy Many-To-Many Orphans from database?


Context

I have a simple MySQL database written with SQLAlchemy. The following are my two models, Subreddit and Keyword, that have a many-to-many relationship, along with their association table:

subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
    db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
    db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id',  ondelete='CASCADE')),
)

class Subreddit(db.Model, JsonSerializer):
    __tablename__ = 'subreddits'

    id = db.Column(db.Integer, primary_key=True)
    subreddit_name = db.Column(db.String(128), index=True)

    # Establish a parent-children relationship (subreddit -> keywords).
    keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete',  passive_deletes=True, lazy='dynamic')
    // ...


class Keyword(db.Model, JsonSerializer):
    __tablename__ = 'keywords'

    id = db.Column(db.Integer, primary_key=True)
    keyword = db.Column(db.String(128), index=True)

    // ...

As test data, I've created the following data set:

Subreddit:
test_subreddit

Keywords:
test_keyword1
test_keyword2
test_keyword3

In other words, test_subreddit.keywords should return [test_keyword1, test_keyword2, test_keyword3].

Problem

When I remove test_subreddit, test_keyword1, test_keyword2, test_keyword3 still persist in the database.

I understand that with many-to-many relationships, there is technically no parent so cascade's technically will not work according to this post: https://stackoverflow.com/a/803584/10426919.

What I've Tried

I followed this link: https://github.com/sqlalchemy/sqlalchemy/wiki/ManyToManyOrphan.

This link provides a library function that should fix my exact problem.

However, the function does not work when integrated into my Model file in the following ways:

Method #1:

from app.extensions import db
from werkzeug.security import generate_password_hash, check_password_hash
from sqlalchemy.inspection import inspect
from sqlalchemy_utils import auto_delete_orphans <------ # library 

subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
    db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
    db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id',  ondelete='CASCADE')),
)

class Subreddit(db.Model, JsonSerializer):
    __tablename__ = 'subreddits'

    id = db.Column(db.Integer, primary_key=True)
    subreddit_name = db.Column(db.String(128), index=True)

    # Establish a parent-children relationship (subreddit -> keywords).
    keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete',  passive_deletes=True, lazy='dynamic')
    // ...


class Keyword(db.Model, JsonSerializer):
    __tablename__ = 'keywords'

    id = db.Column(db.Integer, primary_key=True)
    keyword = db.Column(db.String(128), index=True)

    // ...

auto_delete_orphans(Subreddit.keywords) <------ # Library function

However, this function does not seem to do anything. There is no error that is output to help guide me towards the right direction. When I check my database in MySQL workbench, the Subreddit, test_subreddit, is deleted, but the keywords [test_keyword1, test_keyword2, test_keyword3] are still in the database under the Keywords table.

Method #2:

I tried integrating the actual function, that the library function is based on, into my code as well:

from app.extensions import db
from werkzeug.security import generate_password_hash, check_password_hash
from sqlalchemy.inspection import inspect
from sqlalchemy_utils import auto_delete_orphans
# for deleting many-to-many "orphans".
from sqlalchemy import event, create_engine
from sqlalchemy.orm import attributes, sessionmaker

subreddits_keywords = db.Table('subreddits_keywords', db.Model.metadata,
    db.Column('subreddit_id', db.Integer, db.ForeignKey('subreddits.id', ondelete='CASCADE')),
    db.Column('keyword_id', db.Integer, db.ForeignKey('keywords.id',  ondelete='CASCADE')),
)

class Subreddit(db.Model, JsonSerializer):
    __tablename__ = 'subreddits'

    id = db.Column(db.Integer, primary_key=True)
    subreddit_name = db.Column(db.String(128), index=True)

    # Establish a parent-children relationship (subreddit -> keywords).
    keywords = db.relationship('Keyword', secondary=subreddits_keywords, backref='subreddits', cascade='all, delete',  passive_deletes=True, lazy='dynamic')
    // ...


class Keyword(db.Model, JsonSerializer):
    __tablename__ = 'keywords'

    id = db.Column(db.Integer, primary_key=True)
    keyword = db.Column(db.String(128), index=True)

    // ...

engine = create_engine("mysql://", echo=True)
Session = sessionmaker(bind=engine)

@event.listens_for(Session, 'after_flush')
def delete_tag_orphans(session, ctx):
    # optional: look through Session state to see if we want
    # to emit a DELETE for orphan Tags
    flag = False

    for instance in session.dirty:
        if isinstance(instance, Subreddit) and \
            attributes.get_history(instance, 'keywords').deleted:
            flag = True
            break
    for instance in session.deleted:
        if isinstance(instance, Subreddit):
            flag = True
            break

    # emit a DELETE for all orphan Tags.   This is safe to emit
    # regardless of "flag", if a less verbose approach is
    # desired.
    if flag:
        session.query(Keyword).\
            filter(~Keyword.subreddits.any()).\
            delete(synchronize_session=False)

Again, the keywords persisted despite being attached to no parent.

What I'm trying to accomplish

When children in the database no longer have a parent, I would like them to be removed from the database. What am I doing wrong?


Solution

  • Rather than using auto_delete_orphans, I created a method that I can call when I want to delete children. This method checks the child in question, and sees if it has any parents. If it does have a parent, we leave it be, but if it does not have a parent, we then delete the children.

    Here is how I implemented this method, given that a Subreddit is a parent and a Keyword is a child of Subreddit.

    def check_for_keyword_orphans(keyword):
        # check if each keyword has an associated subreddit
        if len(keyword.subreddits) == 0:
            db.session.delete(keyword)
            return True # keyword deleted
        else:
            return False # keyword has an associated subreddit
    

    And here is how I used the method in my API route:

    keywords = subreddit.keywords
      for keyword in keywords:
        check_for_keyword_orphans(keyword)
            
    db.session.commit()