Search code examples
iosswiftcore-datacloudkit

What are some reliable mechanism to prevent data duplication in CoreData CloudKit?


Every of our data row, contains an unique uuid column.

Previously, before adopting CloudKit, the uuid column has a unique constraint. This enables us to prevent data duplication.

Now, we start to integrate CloudKit, into our existing CoreData. Such unique constraint is removed. The following user flow, will cause data duplication.

Steps to cause data duplication when using CloudKit

  1. Launch the app for the first time.
  2. Since there is empty data, a pre-defined data with pre-defined uuid is generated.
  3. The pre-defined data is sync to iCloud.
  4. The app is uninstalled.
  5. The app is re-installed.
  6. Launch the app for the first time.
  7. Since there is empty data, a pre-defined data with pre-defined uuid is generated.
  8. Previous old pre-defined data from step 3, is sync to the device.
  9. We are now having 2 pre-defined data with same uuid! :(

I was wondering, is there a way for us to prevent such duplication?

In step 8, we wish we have a way to execute such logic before written into CoreData

Check whether such uuid exists in CoreData. If not, write to CoreData. If not, we will pick the one with latest update date, then overwrite the existing data.

I once try to insert the above logic into https://developer.apple.com/documentation/coredata/nsmanagedobject/1506209-willsave . To prevent save, I am using self.managedObjectContext?.rollback(). But it just crash.

Do you have any idea, what are some reliable mechanism I can use, to prevent data duplication in CoreData CloudKit?


Additional info:

Before adopting CloudKit

We are using using the following CoreData stack

class CoreDataStack {
    static let INSTANCE = CoreDataStack()
    
    private init() {
    }
    
    private(set) lazy var persistentContainer: NSPersistentContainer = {
        precondition(Thread.isMainThread)
        
        let container = NSPersistentContainer(name: "xxx", managedObjectModel: NSManagedObjectModel.wenote)
        
        container.loadPersistentStores(completionHandler: { (storeDescription, error) in
            if let error = error as NSError? {
                // This is a serious fatal error. We will just simply terminate the app, rather than using error_log.
                fatalError("Unresolved error \(error), \(error.userInfo)")
            }
        })
        
        // So that when backgroundContext write to persistent store, container.viewContext will retrieve update from
        // persistent store.
        container.viewContext.automaticallyMergesChangesFromParent = true
        
        // TODO: Not sure these are required...
        //
        //container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
        //container.viewContext.undoManager = nil
        //container.viewContext.shouldDeleteInaccessibleFaults = true
        
        return container
    }()

Our CoreData data schema has

  1. Unique constraint.
  2. Deny deletion rule for relationship.
  3. Not having default value for non-null field.

After adopting CloudKit

class CoreDataStack {
    static let INSTANCE = CoreDataStack()
    
    private init() {
    }
    
    private(set) lazy var persistentContainer: NSPersistentContainer = {
        precondition(Thread.isMainThread)
        
        let container = NSPersistentCloudKitContainer(name: "xxx", managedObjectModel: NSManagedObjectModel.wenote)
        
        container.loadPersistentStores(completionHandler: { (storeDescription, error) in
            if let error = error as NSError? {
                // This is a serious fatal error. We will just simply terminate the app, rather than using error_log.
                fatalError("Unresolved error \(error), \(error.userInfo)")
            }
        })
        
        // So that when backgroundContext write to persistent store, container.viewContext will retrieve update from
        // persistent store.
        container.viewContext.automaticallyMergesChangesFromParent = true
        
        // TODO: Not sure these are required...
        //
        //container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
        //container.viewContext.undoManager = nil
        //container.viewContext.shouldDeleteInaccessibleFaults = true
        
        return container
    }()

We change the CoreData data schema to

  1. Not having unique constraint.
  2. Nullify deletion rule for relationship.
  3. Having default value for non-null field.

Based on a feedback of a Developer Technical Support engineer from https://developer.apple.com/forums/thread/699634?login=true , hen mentioned we can

  1. Detecting Relevant Changes by Consuming Store Persistent History
  2. Removing Duplicate Data

But, it isn't entirely clear on how it should be implemented, as the github link provided is broken.


Solution

  • There is no unique constraint feature once we have integrated with CloudKit.

    The workaround on this limitation is

    Once duplication is detected after insertion by CloudKit, we will perform duplicated data deletion.

    The challenging part of this workaround is, how can we be notified when there is insertion performed by CloudKit?

    Here's step-by-step on how to be notified when there is insertion performed by CloudKit.

    1. Turn on NSPersistentHistoryTrackingKey feature in CoreData.
    2. Turn on NSPersistentStoreRemoteChangeNotificationPostOptionKey feature in CoreData.
    3. Set viewContext.transactionAuthor = "app". This is an important step so that when we query on transaction history, we know which DB transaction is initiated by our app, and which DB transaction is initiated by CloudKit.
    4. Whenever we are notified automatically via NSPersistentStoreRemoteChangeNotificationPostOptionKey feature, we will start to query on transaction history. The query will filter based on transaction author and last query token. Please refer to the code example for more detailed.
    5. Once we have detected the transaction is insert, and it operates on our concerned entity, we will start to perform duplicated data deletion, based on concerned entity

    Code example

    import CoreData
    
    class CoreDataStack: CoreDataStackable {
        let appTransactionAuthorName = "app"
        
        /**
         The file URL for persisting the persistent history token.
        */
        private lazy var tokenFile: URL = {
            return UserDataDirectory.token.url.appendingPathComponent("token.data", isDirectory: false)
        }()
        
        /**
         Track the last history token processed for a store, and write its value to file.
         
         The historyQueue reads the token when executing operations, and updates it after processing is complete.
         */
        private var lastHistoryToken: NSPersistentHistoryToken? = nil {
            didSet {
                guard let token = lastHistoryToken,
                    let data = try? NSKeyedArchiver.archivedData( withRootObject: token, requiringSecureCoding: true) else { return }
                
                if !UserDataDirectory.token.url.createCompleteDirectoryHierarchyIfDoesNotExist() {
                    return
                }
                
                do {
                    try data.write(to: tokenFile)
                } catch {
                    error_log(error)
                }
            }
        }
        
        /**
         An operation queue for handling history processing tasks: watching changes, deduplicating tags, and triggering UI updates if needed.
         */
        private lazy var historyQueue: OperationQueue = {
            let queue = OperationQueue()
            queue.maxConcurrentOperationCount = 1
            return queue
        }()
        
        var viewContext: NSManagedObjectContext {
            persistentContainer.viewContext
        }
        
        static let INSTANCE = CoreDataStack()
        
        private init() {
            // Load the last token from the token file.
            if let tokenData = try? Data(contentsOf: tokenFile) {
                do {
                    lastHistoryToken = try NSKeyedUnarchiver.unarchivedObject(ofClass: NSPersistentHistoryToken.self, from: tokenData)
                } catch {
                    error_log(error)
                }
            }
        }
        
        deinit {
            deinitStoreRemoteChangeNotification()
        }
        
        private(set) lazy var persistentContainer: NSPersistentContainer = {
            precondition(Thread.isMainThread)
            
            let container = NSPersistentCloudKitContainer(name: "xxx", managedObjectModel: NSManagedObjectModel.xxx)
            
            // turn on persistent history tracking
            let description = container.persistentStoreDescriptions.first
            description?.setOption(true as NSNumber, forKey: NSPersistentHistoryTrackingKey)
            description?.setOption(true as NSNumber, forKey: NSPersistentStoreRemoteChangeNotificationPostOptionKey)
            
            container.loadPersistentStores(completionHandler: { (storeDescription, error) in
                if let error = error as NSError? {
                    // This is a serious fatal error. We will just simply terminate the app, rather than using error_log.
                    fatalError("Unresolved error \(error), \(error.userInfo)")
                }
            })
            
            // Provide transaction author name, so that we can know whether this DB transaction is performed by our app
            // locally, or performed by CloudKit during background sync.
            container.viewContext.transactionAuthor = appTransactionAuthorName
            
            // So that when backgroundContext write to persistent store, container.viewContext will retrieve update from
            // persistent store.
            container.viewContext.automaticallyMergesChangesFromParent = true
            
            // TODO: Not sure these are required...
            //
            //container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
            //container.viewContext.undoManager = nil
            //container.viewContext.shouldDeleteInaccessibleFaults = true
            
            // Observe Core Data remote change notifications.
            initStoreRemoteChangeNotification(container)
            
            return container
        }()
        
        private(set) lazy var backgroundContext: NSManagedObjectContext = {
            precondition(Thread.isMainThread)
            
            let backgroundContext = persistentContainer.newBackgroundContext()
    
            // Provide transaction author name, so that we can know whether this DB transaction is performed by our app
            // locally, or performed by CloudKit during background sync.
            backgroundContext.transactionAuthor = appTransactionAuthorName
            
            // Similar behavior as Android's Room OnConflictStrategy.REPLACE
            // Old data will be overwritten by new data if index conflicts happen.
            backgroundContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
            
            // TODO: Not sure these are required...
            //backgroundContext.undoManager = nil
            
            return backgroundContext
        }()
        
        private func initStoreRemoteChangeNotification(_ container: NSPersistentContainer) {
            // Observe Core Data remote change notifications.
            NotificationCenter.default.addObserver(
                self,
                selector: #selector(storeRemoteChange(_:)),
                name: .NSPersistentStoreRemoteChange,
                object: container.persistentStoreCoordinator
            )
        }
        
        private func deinitStoreRemoteChangeNotification() {
            NotificationCenter.default.removeObserver(self)
        }
        
        @objc func storeRemoteChange(_ notification: Notification) {
            // Process persistent history to merge changes from other coordinators.
            historyQueue.addOperation {
                self.processPersistentHistory()
            }
        }
        
        /**
         Process persistent history, posting any relevant transactions to the current view.
         */
        private func processPersistentHistory() {
            backgroundContext.performAndWait {
                
                // Fetch history received from outside the app since the last token
                let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
                historyFetchRequest.predicate = NSPredicate(format: "author != %@", appTransactionAuthorName)
                let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
                request.fetchRequest = historyFetchRequest
    
                let result = (try? backgroundContext.execute(request)) as? NSPersistentHistoryResult
                guard let transactions = result?.result as? [NSPersistentHistoryTransaction] else { return }
    
                if transactions.isEmpty {
                    return
                }
                
                for transaction in transactions {
                    if let changes = transaction.changes {
                        for change in changes {
                            let entity = change.changedObjectID.entity.name
                            let changeType = change.changeType
                            let objectID = change.changedObjectID
                            
                            if entity == "NSTabInfo" && changeType == .insert {
                                deduplicateNSTabInfo(objectID)
                            }
                        }
                    }
                }
                
                // Update the history token using the last transaction.
                lastHistoryToken = transactions.last!.token
            }
        }
        
        private func deduplicateNSTabInfo(_ objectID: NSManagedObjectID) {
            do {
                guard let nsTabInfo = try backgroundContext.existingObject(with: objectID) as? NSTabInfo else { return }
                
                let uuid = nsTabInfo.uuid
                
                guard let nsTabInfos = NSTabInfoRepository.INSTANCE.getNSTabInfosInBackground(uuid) else { return }
                
                if nsTabInfos.isEmpty {
                    return
                }
                
                var bestNSTabInfo: NSTabInfo? = nil
                
                for nsTabInfo in nsTabInfos {
                    if let _bestNSTabInfo = bestNSTabInfo {
                        if nsTabInfo.syncedTimestamp > _bestNSTabInfo.syncedTimestamp {
                            bestNSTabInfo = nsTabInfo
                        }
                    } else {
                        bestNSTabInfo = nsTabInfo
                    }
                }
                
                for nsTabInfo in nsTabInfos {
                    if nsTabInfo === bestNSTabInfo {
                        continue
                    }
                    
                    // Remove old duplicated data!
                    backgroundContext.delete(nsTabInfo)
                }
                
                RepositoryUtils.saveContextIfPossible(backgroundContext)
            } catch {
                error_log(error)
            }
        }
    }
    

    Reference

    1. https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud - In the sample code, the file CoreDataStack.swift illustrate a similar example, on how to remove duplicated data after cloud sync.
    2. https://developer.apple.com/documentation/coredata/consuming_relevant_store_changes - Information on transaction histories.
    3. What's the best approach to prefill Core Data store when using NSPersistentCloudKitContainer? - A similar question