Search code examples
databasedjangodjango-fixturesdjango-extensions

Syncronizing data between two django servers


I have a central Django server containing all of my information in a database. I want to have a second Django server that contains a subset of that information in a second database. I need a bulletproof way to selectively sync data between the two.

  • The secondary Django will need to pull its subset of data from the primary at certain times. The subset will have to be filtered by certain fields.
  • The secondary Django will have to occasionally push its data to the primary.
  • Ideally, the two-way sync would keep the most recently modified objects for each model.

I was thinking something along the lines of having using TimeStampedModel (from django-extensions) or adding my own DateTimeField(auto_now=True) so that every object stores its last modified time. Then, maybe a mechanism to dump the data from one DB and load it in to the other such that only the more recently modified objects are kept.

Possibilities I am considering are django's dumpdata, django-extensions dumpscript, django-test-utils makefixture or maybe django-fixture magic. There's a lot to think about, so I'm not sure which road to proceed down.


Solution

  • Here is my solution, which fits all of my requirements:

    1. Implement natural keys and unique constraints on all models
      • Allows for a unique way to refer to each object without using primary key IDs
    2. Sublcass each model from TimeStampedModel in django-extensions
      • Adds automatically updated created and modified fields
    3. Create a Django management command for exporting, which filters a subset of data and serializes it with natural keys

      baz = Baz.objects.filter(foo=bar)
      yaz = Yaz.objects.filter(foo=bar)
      
      objects = [baz, yaz]
      flat_objects = list(itertools.chain.from_iterable(objects))
      
      data = serializers.serialize("json", flat_objects, indent=3, use_natural_keys=True)
      print(data)
      
    4. Create a Django management command for importing, which reads in the serialized file and iterates through the objects as follows:

      • If the object does not exist in the database (by natural key), create it
      • If the object exists, check the modified timestamps
      • If the imported object is newer, update the fields
      • If the imported object is older, do not update (but print a warning)

    Code sample:

    # Open the file
    with open(args[0]) as data_file:
        json_str = data_file.read()
    
    # Deserialize and iterate
    for obj in serializers.deserialize("json", json_str, indent=3, use_natural_keys=True):
    
        # Get model info
        model_class = obj.object.__class__
        natural_key = obj.object.natural_key()
        manager = model_class._default_manager
    
        # Delete PK value
        obj.object.pk = None
    
        try:
            # Get the existing object
            existing_obj = model_class.objects.get_by_natural_key(*natural_key)
    
            # Check the timestamps
            date_existing = existing_obj.modified
            date_imported = obj.object.modified
            if date_imported > date_existing:
    
                # Update fields
                for field in obj.object._meta.fields:
                    if field.editable and not field.primary_key:
                        imported_val = getattr(obj.object, field.name)
                        existing_val = getattr(existing_obj, field.name)
                        if existing_val != imported_val:
                            setattr(existing_obj, field.name, imported_val)
    
        except ObjectDoesNotExist:
            obj.save()
    

    The workflow for this is to first call python manage.py exportTool > data.json, then on another django instance (or the same), call python manage.py importTool data.json.