Search code examples
djangodjango-taggit

Django-taggit migration fails at AlterUniqueTogether


I'm trying to migrate my Django application which uses jazzband django-taggit The error I get is:

django.db.utils.IntegrityError: could not create unique index "taggit_taggeditem_content_type_id_object_i_4bb97a8e_uniq"
DETAIL:  Key (content_type_id, object_id, tag_id)=(41, 596, 242) is duplicated.

The migration in question reads:

        migrations.AlterUniqueTogether(
            name="taggeditem", unique_together={("content_type", "object_id", "tag")}
        )

https://github.com/jazzband/django-taggit/blob/master/taggit/migrations/0003_taggeditem_add_unique_index.py#L12-L14

Which translates to the following SQL:

ALTER TABLE "taggit_taggeditem" ADD CONSTRAINT "taggit_taggeditem_content_type_id_object_i_4bb97a8e_uniq" UNIQUE ("content_type_id", "object_id", "tag_id");
COMMIT;

Checking the table in question I do get:

# SELECT * FROM public.taggit_taggeditem WHERE tag_id=242 ORDER BY object_id;
  id  | tag_id | object_id | content_type_id 
------+--------+-----------+-----------------
  691 |    242 |       356 |              41
 2904 |    242 |       356 |              41
  680 |    242 |       486 |              41
 2893 |    242 |       486 |              41
  683 |    242 |       596 |              41
 2896 |    242 |       596 |              41

What is the suggested way to resolve the django.db.utils.IntegrityError error and successfully finish the migration? I think the same will happen with object_id 486 and 356 (+ several more).


Solution

  • Before migrating the model, you should make a data migration. You thus probably better first remove the migration file, then make a data migration with:

    python3 manage.py makemigrations --empty app_name

    In this migration could look like:

    from django.db import migrations
    from django.db.models import Exists, OuterRef
    
    def remove_duplicates(apps, schema_editor):
        Model = apps.get_model('app_name', 'ModelName')
        Model.objects.annotate(
            has_dup=Exists(
                Model.objects.filter(
                    pk__lt=OuterRef('pk'),
                    content_type_id=OuterRef('content_type_id'),
                    object_id=OuterRef('object_id'),
                    tag_id=OuterRef('tag_id'),
                )
            )
        ).filter(has_dup=True).delete()
    
    class Migration(migrations.Migration):
    
        dependencies = [
            ('app_name', '1234_some_migration'),
        ]
    
        operations = [
            migrations.RunPython(remove_duplicates),
        ]

    Where you replace app_name and ModelName with the name of the app and the model respectively (and check that it depends on the previous migration file).

    Here we thus look for Models that contain duplicate data with a primary key less than the current one. We delete such records.

    Next you make the migrations for the app again:

    python3 manage.py makemigrations app_name

    I would (strongly) advise to backup the database first before running this, since it is always possible that there is a sophisticated problem with the data migration.