Search code examples
pythondjangodjango-modelsdjango-south

UUID field added after data already in database. Is there any way to populate the UUID field for existing data?


I've added a UUID field to some of my models and then migrated with South. Any new objects I create have the UUID field populated correctly. However the UUID fields on all my older data is null.

Is there any way to populate UUID data for existing data?


Solution

  • For the following sample class:

    from django_extensions.db.fields import UUIDField
    
    def MyClass:
        uuid = UUIDField(editable=False, blank=True)
        name = models.CharField()
    

    If you're using South, create a data migration:

    python ./manage.py datamigration <appname> --auto
    

    And then use the following code to update the migration with the specific logic to add a UUID:

    from django_extensions.utils import uuid
    
    def forwards(self, orm):
        for item in orm['mypp.myclass'].objects.all():
            if not item.uuid:
                item.uuid = uuid.uuid4() #creates a random GUID
                item.save()
    
    
    def backwards(self, orm):
        for item in orm['mypp.myclass'].objects.all():
            if item.uuid:
                item.uuid = None
                item.save()
    

    You can create different types of UUIDs, each generated differently. the uuid.py module in Django-extensions has the complete list of the types of UUIDs you can create.

    It's important to note that if you run this migration in an environment with a lot of objects, it has the potential to time out (for instance, if using fabric to deploy). An alternative method of filling in already existing fields will be required for production environments.

    It's possible to run out of memory while trying to do this to a large number of objects (we found ourselves running out of memory and having the deployment fail with 17,000+ objects).

    To get around this, you need to create a custom iterator in your migration (or stick it where it's really useful, and refer to it in your migration). It would look something like this:

    def queryset_iterator(queryset, chunksize=1000):
        import gc
        pk = 0
        last_pk = queryset.order_by('-pk')[0].pk
        queryset=queryset.order_by('pk')
        if queryset.count() < 1
            return []
        while pk < last_pk:
            for row in queryset.filter(pk__gt=pk)[:chunksize]:
                pk = row.pk
                yield row
            gc.collect()
    

    And then your migrations would change to look like this:

    class Migration(DataMigration):
    
        def forwards(self, orm):
            for item in queryset_iterator(orm['myapp.myclass'].objects.all()):
                if not item.uuid:
                    item.uuid = uuid.uuid1()
                    item.save()
    
        def backwards(self, orm):
            for item in queryset_iterator(orm['myapp.myclass'].objects.all()):
                if item.uuid:
                    item.uuid = None
                    item.save()