Search code examples
djangopostgresqldjango-modelsdjango-southdjango-orm

Django ORM — Significant database alterations / data migrations


I have a django powered website and an EC2 postgresql server. The website has a growing community and tons of user submitted content. During development the models and views have gotten extremely messy as I tack on new features. I would like to start fresh, rewrite the whole program and split up some of the models and views into modular apps.

Just as an example, if I wanted to migrate from:

content.models.py

class Content(models.Model):
    user = models.ForeignKey(User)
    post = models.CharField(max_length=500)
    photo = models.ImageField(upload_to='images/%Y/%m/%d')

to:

content.models.py

class Content(models.Model):
    user = models.ForeignKey(User)
    post = models.CharField(max_length=500)

photo.models.py

    photo = models.ImageField(upload_to='images/%Y/%m/%d')
    content = models.ForeignKey(content.models.Content)

What is the best way to go about this without losing any data?


Solution

  • This case could be solved by 3 south migrations:

    1. [schema migration] create Photo table
    2. [data migration] for each Content record create a Photo record with photo and content fields properly set
    3. [schema migration] delete field photo from Content table

    Update: step 2

    python manage.py datamigration <app_name> copy_photos_to_separate_model

    A new file will be created in <app_name>/migrations/####_copy_photos…

    Edit that file. Edit forward and backward methods. The first one is called when migrating forward, the other while migrating backwards.

    The first one creates separate photos out of consolidated model. The other will have to pick one of possibly many photos at that time and squeeze it back into Content model.

    The special orm object represents db's state at the time of the migration (despite how models.py looks like in that time – when deplying to production it will be different from how it looked like when the migration was run on test/develop environment).

    def forward(self, orm):
        Content = orm['<app_name>.Content']
        Photo = orm['<app_name>.Photo']
        for content in Content.objects.all():
             Photo.object.get_or_create(content=content, 
                                        defaults={'photo': content.photo})
    

    Depending on how big the table is you can try to optimize the number of queries.

    It will blow out if there are already multiple Photo records per single content in the db, but it should not be your case.