Search code examples
mysqldjangounicodeloaddata

Django loaddata fails for unicode


Here's the model:

class ListItem(models.Model):
   # id -- PK
   dateCreated = models.DateTimeField(auto_now_add=True)
   dateModified = models.DateTimeField(auto_now_add=True)
   listId = models.IntegerField(null=False)  # required FK => List
   itemId = models.CharField(max_length=8, null=False)
   description = models.CharField(max_length=1024)
   notes = models.CharField(max_length=2048)

   class Meta:
      ordering = ('itemId',)

And here is the offending item from the fixture file (json):

    {
    "model": "myproject.ListItem",
    "pk": 721,
    "fields": {
        "listId": 26,
        "itemId": "A3",
        "description": "Statystyka Książek Papierowych, Mówionych I Elektronicznych",
        "notes": "Polish Statistical Book and E-book Classification",
        "dateCreated": "2018-05-14 22:05:25",
        "dateModified": "2018-05-14 22:05:25"
    }
}, ...

and here is the command I used to try to load the data:

python3 manage.py loaddata listItems.json

which resulted in the following error(s):

django.db.utils.OperationalError: Problem installing fixture \
'/Users/sloughin/dev/myproject/fixtures/listItems.json': \
Could not load myproject.ListItem(pk=721): \
(1366, "Incorrect string value: '\\xC4\\x85\\xC5\\xBCek...' \
for column 'description' at row 1")

should I be using some flag in the model to indicate that I expect this field to contain unicode data?

This is running against a MySQL database on Ubuntu 16.04, and I'm running python 3.6.2 on an iMac (OSX 10.13.5). All my other loaddata operations worked fine.


Solution

  • Ok, I think I figured this out. The MySQL database VARCHAR fields default to utf8, which support most (but not all) unicode characters. This is why some records were ok and others were not. You have to use utf8mb4 rather than utf8. This increases the width of the database fields since they now use 4 bytes per character rather than 3, (so for some DBMS you might also need to enlarge the width of the fields in your model by a factor of 4/3) though I'm told MySQL fields are specified in characters, not bytes.

    However, we will need to have a new migration file to work with, so I'm going to change one of the fields to be one char longer.

    class ListItem(models.Model):
          # id -- PK
          dateCreated = models.DateTimeField(auto_now_add=True)
          dateModified = models.DateTimeField(auto_now_add=True)
          listId = models.IntegerField(null=False)  # required FK => List
          itemId = models.CharField(max_length=8, null=False)
          description = models.CharField(max_length=1025)  # was 1024
          notes = models.CharField(max_length=2048)  
    

    I reran python3 manage.py makemigrations

    Next, I went to the migrations folder and edited the new migration file (mine was called 002_auto_20180619_1849.py) which contains class Migrations: where the operations are defined. You will see the AlterField operations from making the field wider, but after that you can append additional SQL operations like this:

    migrations.AlterField(...),
    migrations.RunSQL(
        'alter table myproject_listitem CONVERT TO CHARACTER SET 
         utf8mb4 COLLATE utf8mb4_unicode_ci;'),
    migrations.RunSQL(
        'alter table myproject_listitem CONVERT TO CHARACTER SET 
         utf8mb4 COLLATE utf8mb4_unicode_ci;'),
    migrations.RunSQL(
        'alter table myproject_listitem change description description 
         VARCHAR(1366) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'),
    migrations.RunSQL(
        'alter table myproject_listitem change notes notes 
         VARCHAR(2731) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'),
    

    Here, I've split each SQL statement into two lines, so you can read it. If you have direct access to the database you could also do this in a mysql session, but I wanted to know how to do it as part of the migration.

    BEFORE you run this migration, check to be sure you have sqlparse installed. If not, then pip3 install sqlparse or include it in your virtual environment or docker container as a requirement. Then:

    python3 manage.py migrate
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, myproject, sessions
    Running migrations:
      Applying myproject.0002_auto_20180619_1849... OK
    

    After that, I was able to run python3 manage.py loaddata listItems.json and everything worked fine!