core-data parse-platform dropbox synchronization ensembles

Core data sync via Parse

I am interested in developing a library that would sync a core data model across devices via the Parse mobile backend. I want to mirror the functionality that iCloud core data sync attempts to provide.

Why not use iCloud or Ensembles? I am currently using iCloud core data sync in a production app and it is not working well for me. I also want to provide authentication independent of the Apple ID which is another reason I want to get away from iCloud. As far as Ensembles is concerned, I am not sure if this will still work with Dropbox due the deprecation of the dropbox sync API.

I haven’t begun to develop the library. I am looking for feedback on my plan which is outlined below. This design is based off of this SO answer.

General design of the library:

The library would provide a standard core data stack that would set up the persistent store coordinator and managed object context. All of the standard core data CRUD operations would proceed through an interface provided by the library.
Each time CUD operation takes place, a sync operation object would be saved to Parse in the background that includes all of the information needed to reproduce the operation. This includes: the type of operation that took place, a unique identifier for the object that was operated on, and in the case of a create operation, the parent object and relationship would be provided.
Each operation would have a change_id number associated with it. Every time the device downloads and executed an operation, it would store the latest change_id associated with that operation.
Prior to uploading each sync operation, the device would send a request to the server to ensure that the change_id number stored matches the one stored locally. If the change_id on the server is higher, it would first download all of the sync operations and execute them then upload its own sync operations.
Conflicts (two devices editing the same value while offline) would be resolved by determining which device changed the value last.

Am I missing anything here? What are some potential pitfalls with this approach? I hear that sync is hard, should this type of undertaking be left to the most experienced developers?

Solution

I'm not the least biased responder, because I am the developer of the Ensembles framework, but let me pitch in some thoughts.

In regards to Ensembles itself, it is a backend-agnostic framework. Yes, it does work with iCloud and Dropbox Sync API, but also with CloudKit, Dropbox Core API (which is not deprecated), and WebDAV. There is also a custom Node.js server available with one package which allows you to host the data yourself using Heroku and S3.

So even if you don't want to stick with Apple, there are other options. But even more than that, you can write your own backend adaptor class. Most are around 500 lines of code, and you can base it off one of the existing classes. This would allow you to make a backend that stores data and authenticates with Parse, and leave the merging of data to Ensembles. Another advantage of this is that you can easily move to other backends in future, or offer them as options. (CloudKit is definitely worth a look.)

But let's assume you are determined not to use someone else's framework, then yes, your approach sounds globally right.

Rather than making CRUD operations go through an interface, you can just observe NSManagedObjectContextDidSaveNotification and extract the changes from the userInfo dictionary.

I'm sure you will find lots of little things you didn't think about, and it's these details that tend to make sync hard. One such example is that you need to build something robust enough to handle failures such as the Parse operations not completing before the app quits. You probably need to have a change tag on every object, so you can retrieve the ones that changed since the last sync.

If your app have a small amount of data, it building this system is not terribly difficult, but as your data starts to get bigger, you need to start using things like batching to keep in-memory data low on iOS. This sort of thing can take a lot of time. For example, Ensembles 2 has pretty much an identical API to Ensembles 1, but I spent about 4 months just rewriting things like batching to be memory efficient.

I built a prototype app using the approach you describe (app was social, not syncing, hence no Ensembles). I used CloudKit, which is very similar to Parse. It was about 1000 lines of Swift code to get the whole data upload/download working OK, with a local Core Data cache. It's certainly do-able, especially if you know Core Data well already. Otherwise there might be a learning curve.

My advocacy of a framework like Ensembles is simply that it has already solved many of the small details you will need to solve, and it will not lock you into a particular backend. If Parse decided to raise their fees, you would be free to move elsewhere.