Search code examples
ioscore-datasynchronizationicloudensembles

Removing near duplicates with Core Data and Ensembles (iCloud)


Summary

My problem is that I want to get rid of near duplicates in my Core Data based iOS project that uses Ensembles to sync with iCloud.

  • The sync with iCloud works basically well in my app.
  • The problem is when a user creates similar objects on multiple devices before his persistent store is leeched by Ensembles (connected to iCloud).
  • This generates near duplicates which is factually correct.
  • My approach to remove these duplicates doesn't seem to work.

Detailed problem

A user can create NSManagedObjects on different devices before he is connected to iCloud. Lets say he has a NSManagedObject named Car which has a "To One" relationship to a NSManagedObject named Person which in return has a "To Many" relationship to Car. This would look like this: A simplified model

Ok, lets imagine the user has two devices and he creates two NSManagedObjects on each device. A Car named "Audi" and a Person named "Raphael". Both connected through a relationship. On the other device he creates a Car named "BMW" and another Person named "Raphael". Also connected to each other. Now the user has two similar objects on each device: Two Person objects both named "Raphael."

My Problem is that the user would end up having two Person objects with the name "Raphael" on each device after he synced.

This is actually correct since the objects get their uniqueIdentifiers (to identify objects in Ensembles) when the user leeches his persistent store. The objects are factually different. But this what I want to fix.

My approach

I implemented this delegate method and removed the duplicates in the reparationContext.

- (BOOL)persistentStoreEnsemble:(CDEPersistentStoreEnsemble *)ensemble 
    shouldSaveMergedChangesInManagedObjectContext:(NSManagedObjectContext*)savingContext
    reparationManagedObjectContext(NSManagedObjectContext *)reparationContext {

    [reparationContext performBlockAndWait:^{

        // Find duplicates
        // Change relationships and only use the inserted Person object (the one from iCloud)
        // Delete local Person object
        [reparationContext save:nil];
    }
    return YES;
}

Basically this seems to work well on the second device that merges the data from the first device. But unfortunately it seems that the local person is still synced to iCloud even if it was deleted in the reparationContext.

This leads to a broken state since the first device then also merges the changes from the second device and replaces the person again which was already deleted on the second device. Some syncs later the person is finally missing in the car relationship and the app throws syncing errors.

Steps to reproduce the problem

  • Step 1 (Device 1)

    • Create objects
    • Data: Car "Audi" -> Person "Raphael (Device 1)"
  • Step 2 (Device 2)

    • Create objects
    • Data: Car "BMW" -> Person "Raphael (Device 2)"
  • Step 3 (Device 1)

    • Leech data from store
    • Connect to iCloud
    • Send data to iCloud
    • Data: Car "Audi" -> Person "Raphael (Device 1)"
  • Step 4 (Device 2)

    • Leech data from store
    • Connect to iCloud
    • Merge data from iCloud
    • Replace local person from Device 2 with inserted person from Device 1
    • Delete local person from Device 2
    • Send data to iCloud
    • Data:
      Car "Audi" -> Person "Raphael (Device 1)"
      Car "BMW" -> Person "Raphael (Device 1)"
  • Step 5 (Device 1)

    • Merge data from iCloud
    • Replace local person from Device 1 with inserted person from Device 2 (this shouldn’t happen)
    • Delete local person from Device 1 (this shouldn’t happen)
    • Send data to iCloud
    • Expected data:
      Car "Audi" -> Person "Raphael (Device 1)"
      Car "BMW" -> Person "Raphael (Device 1)"
    • Actual data:
      Car "Audi" -> Person "Raphael (Device 2)"
      Car "BMW" -> Person "Raphael (Device 2)"

Actually the local person object "Raphael (Device 2)" was deleted in Step 4, but it seems that it was still sent to iCloud because in Step 5 it pops up as an insert in savingContext.insertedObjects from the shouldSaveMergedChangesInManagedObjectContext delegate method.

As far as I understood, Ensembles first pulls changeds from iCloud, asks the user if everything is as expected via the delegate methods, then merges into the persistent store and sends deltas to iCloud after the merge.

Am I doing something wrong? Or is this an Ensembles bug?


Solution

  • There is the issue that lars mentioned. You do have to be careful to always do things deterministically. Sorting on unique id is one way to do that.

    Personally, I would handle this one of two other ways:

    1. Do the dedupe after a merge completes (again, making sure it is deterministic)
    2. Using carefully chosen global identifiers to control dedupe for you.

    For example, you could use the unique id Raphael. The only thing you then need to be careful of is that when you go to create another Raphael on the same machine, it is called Raphael_1 (or whatever).

    If your unique id is very likely to be unique (e.g. first + last name is unlikely to clash), Ensembles will automatically merge the person on different devices.