Search code examples
iosswiftcore-datacloudkit

What is the right way to perform persistent history purging, without affecting the correctness of CloudKit?


Currently, we are using local CoreData with CloudKit feature, by using NSPersistentCloudKitContainer.

Why we enable persistent history tracking feature?

Due to the problem described at https://stackoverflow.com/a/72554542/72437 , we need to enable NSPersistentHistoryTrackingKey.


Purge History

Based on https://developer.apple.com/documentation/coredata/consuming_relevant_store_changes, we should perform persistent history purging, manually.


But, it isn't entirely clear, on how we can purge the history in a safe way, without affect the correctness of CloudKit. We tend to run a few tests with the following setup.

  1. Run a simulator. We will perform insert operation within the simulator
  2. Run a real device. The real device will receive silent push notification due to step 1.
  3. Both simulator and real device are running same code.
  4. Whenever we insert an item in simulator, we will observe what happen in real device.

Test 1: Purge all history data immediately after processing

@objc func storeRemoteChange(_ notification: Notification) {
    // Process persistent history to merge changes from other coordinators.
    historyQueue.addOperation {
        self.processPersistentHistory()
    }
}

/**
 Process persistent history, posting any relevant transactions to the current view.
 */
private func processPersistentHistory() {
    backgroundContext.performAndWait {
        
        // Fetch history received from outside the app since the last token
        let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
        historyFetchRequest.predicate = NSPredicate(format: "author != %@", appTransactionAuthorName)
        let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
        request.fetchRequest = historyFetchRequest

        let result = (try? backgroundContext.execute(request)) as? NSPersistentHistoryResult
        guard let transactions = result?.result as? [NSPersistentHistoryTransaction] else { return }

        ...
        
        // Update the history token using the last transaction.
        lastHistoryToken = transactions.last!.token
        
        // Remove history before the last history token
        let purgeHistoryRequest = NSPersistentHistoryChangeRequest.deleteHistory(before: lastHistoryToken)
        do {
            try backgroundContext.execute(purgeHistoryRequest)
        } catch {
            error_log(error)
        }
    }
}

Our observation is, the real device is getting wrong CloudKit sync info. The real device is either getting duplicated data, or its data is being deleted.

Our hypothesis to this problem is

  1. Persistence history data is shared among multiple persistence coordinators.
  2. Our visible coordinator, has finished processing the transaction, mark a record in lastHistoryToken, then purge all histories older than lastHistoryToken.
  3. However, there are another invisible coordinator, used by CloudKit for syncing. There is high chance that, the CloudKit coordinator is not yet process the removed history transactions.
  4. This causes all data went wrong, when CloudKit tend to sync the real device data, without necessary transaction history.

Test 2: Purge all history data older than 2 minutes after processing

We fine tune the above code, by only removing transaction history older than 2 minutes.

// Remove history older than 2 minutes.
let date = Date(timeMillis: Date.currentTimeMillis - 2*60*1000)
let purgeHistoryRequest = NSPersistentHistoryChangeRequest.deleteHistory(before: date)
do {
    try backgroundContext.execute(purgeHistoryRequest)
} catch {
    error_log(error)
}

Our observation is that

  1. If the time difference between the last storeRemoteChange triggering and current storeRemoteChange is less than 2 minutes, real device will get correct CloudKit sync info.
  2. If the time difference between the last storeRemoteChange triggering and current storeRemoteChange is more than 2 minutes, real device will get wrong CloudKit sync info. The real device is either getting duplicated data, or its data is being deleted.

Summary & Question

Based on How to prune history right in a CoreData+CloudKit app?

The author suggest

So it is indeed safe to prune the persistent history after say seven days after it has been processed.

For 1 user, 2 devices case.

  1. A user tend to read/ write frequently, on his frequent used device A.
  2. The user will launch the same app, on his rarely used device B, after 7 days since the last used on device B.

Does this mean, device B will be getting the wrong CloudKit sync info? (Seems like yes based on Test 2 observation)

If yes, what is a good way to perform persistent history purging, without affecting the correctness of CloudKit?


How can I run the Test 2?

You can setup and run the test 2 by

  1. Setup and run the sample from https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud
  2. Replace CoreDataStack.swift with https://gist.github.com/yccheok/df21f199b81b19764ffbcd4a4583c430 . It contains helper functions for Date, and the 2 minute history purging code.
  3. In simulator, create 1 records by tapping top right. You can observe the real device now is having the 1 records.
  4. After 3 minutes, tap on the top right again. In simulator, you can observe there are total 2 records. However, in real device, the data is gone!

enter image description here (In this picture, left device is a real device, and right device is a simulator)


Solution

  • UPDATE: Particularly important for a CoreData + CloudKit setup

    In this post from a WWDC22 Core Data Lab, an Apple Core Data framework engineer answers the question "Do I ever need to purge the persistent history tracking data?" as follows:

    No. We don’t recommend it. NSPersistentCloudKitContainer uses the persistent history token to track what to sync. If you delete history the cloud sync is reset and has to upload everything from scratch. It will recover but it’s not a good customer experience. It shouldn’t normally be necessary to delete history. For example, the Apple Photos app doesn’t trim its history, so unless you’re generating massive amounts of history don’t do it.

    tl;dr:

    It seems that purging the persistent history after 7 days works in almost all cases.
    It probably does not, if GBs of data have to be synced.

    What I did:

    I could reproduce the error:
    If in Apple' demo app data are synced after the persistent history is purged, wrong data may be displayed. Apparently some info has been deleted that is essential for the demo app.

    Below, I started testing with a clean setup:
    I deleted the app from simulator and device, and cleared all CD_Post records in the iCloud private database, zone com.apple.coredata.cloudkit.zone, using the dashboard.
    To check for info that might have been deleted unintentionally, I inserted in func processPersistentHistory() a print statement in the guard statement that filters the persistent history for transactions:

    guard let transactions = result?.result as? [NSPersistentHistoryTransaction],
          !transactions.isEmpty
          else {
            print("**************** \(String(describing: result?.result))")
            return
          }  
    

    If I run the app on the simulator under Xcode, no entries were shown as expected, and the log shows now many such entries:

    **************** Optional(<__NSArray0 0x105a61900>(
    
    )
    )  
    

    Apparently the persistent history contains iCloud mirroring housekeeping information that is deleted when the persistent history is purged. This indicates to me that the mirroring software needs "enough time" to finish its operation successfully, and thus only "old" history entries should be purged. But what is "old"? 7 days?

    Next, on the simulator under Xcode, I installed and executed the app with immediate purging as in Test 1 of the question.

    // Remove history before the last history token
    let purgeHistoryRequest = NSPersistentHistoryChangeRequest.deleteHistory(before: lastHistoryToken)
    do {
      try taskContext.execute(purgeHistoryRequest)
    } catch {
      print("\(error)")
    }
    

    On the simulator, I added an entry. This entry was shown in the dashboard.

    Then, on the device under Xcode, I also installed and executed the app with immediate purging. The entry was correctly shown, i.e. the iCloud record was mirrored to the persistent store of the device, the history was processed and immediately purged, although, maybe, the the mirroring software did not have "enough time" to finish its operation successfully.

    On the simulator, I added a 2nd entry. This entry was also shown in the dashboard.

    However, on the device the 1st entry disappeared, i.e. the table was now empty, but both entries were still shown in the dashboard, i.e. the iCloud data were not corrupted.

    I then set a breakpoint at DispatchQueue.main.async of func processPersistentHistory(). This breakpoint is only reached when a remote change of the persistent store is processed. To reach the breakpoint in the device, I added a 3rd entry in the simulator. Thus the breakpoint was reached in the device, and in the debugger I entered

    (lldb) po taskContext.fetch(Post.fetchRequest())  
    ▿ 3 elements
      - 0 : <Post: 0x281400910> (entity: Post; id: 0xbc533cc5eb8b892a <x-coredata://C9DEC274-B479-4AF5-9349-76C1BABB5016/Post/p3>; data: <fault>)
      - 1 : <Post: 0x281403d90> (entity: Post; id: 0xbc533cc5eb6b892a <x-coredata://C9DEC274-B479-4AF5-9349-76C1BABB5016/Post/p4>; data: <fault>)
      - 2 : <Post: 0x281403390> (entity: Post; id: 0xbc533cc5eb4b892a <x-coredata://C9DEC274-B479-4AF5-9349-76C1BABB5016/Post/p5>; data: <fault>)
    

    This indicates to me that the persistent store in the device has correct data, and only the displayed table is wrong.

    Next I investigated func update in the MainViewController. This function is called from func didFindRelevantTransactions, which is called when history is processed, and relevant transactions are posted. During my tests, transactions.count is always <= 10, so the transactions are processed in the block transactions.forEach.
    I tried to find out what NSManagedObjectContext.mergeChanges does. Thus I modified the code as

    transactions.forEach { transaction in
      guard let userInfo = transaction.objectIDNotification().userInfo else { return }
      let viewContext = dataProvider.persistentContainer.viewContext
      print("BEFORE: \(dataProvider.fetchedResultsController.fetchedObjects!)")
      print("================ mergeChanges: userInfo: \(userInfo)")
      NSManagedObjectContext.mergeChanges(fromRemoteContextSave: userInfo, into: [viewContext])
      print("AFTER: \(dataProvider.fetchedResultsController.fetchedObjects!)")
    }  
    

    To see, what happens to the viewContext, I implemented

    @objc func managedObjectContextObjectsDidChange(notification: NSNotification) {
      guard let userInfo = notification.userInfo else { return }
      print(#function, userInfo)  
    }
    

    and to see how this influences the fetchedResultsController, I implemented also

    func controller(_ controller: NSFetchedResultsController<NSFetchRequestResult>, 
                    didChange anObject: Any, 
                    at indexPath: IndexPath?, 
                    for type: NSFetchedResultsChangeType, 
                    newIndexPath: IndexPath?) {
      print("**************** ", #function, "\(type) ", anObject)
    }  
    

    To keep the logs relatively short, I deleted in the dashboard all CD_Post entries except the 1st one, and deleted the app from the simulator ans the device.
    I then run, under Xcode, the app on the simulator and the device. Both show the 1st entry.

    I then entered another entry in the simulator. As unfortunately expected, the table on the device was cleared. Here is the log of the device:

    BEFORE: [<Post: 0x2802c2d50> (entity: Post; id: 0x9aac7c6d193c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p1>; data: {
        attachments =     (
        );
        content = nil;
        location = nil;
        tags =     (
        );
        title = "Untitled 3:40:24 PM";
    }), <Post: 0x2802d2a80> (entity: Post; id: 0x9aac7c6d195c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p2>; data: <fault>)]
    ================ mergeChanges: userInfo: [AnyHashable("deleted_objectIDs"): {(
        0x9aac7c6d195c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p2>,
        0x9aac7c6d193c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p1>
    )}]
    managedObjectContextObjectsDidChange(notification:) [AnyHashable("managedObjectContext"): <_PFWeakReference: 0x2821a8100>, AnyHashable("deleted"): {(
        <Post: 0x2802d2a80> (entity: Post; id: 0x9aac7c6d195c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p2>; data: {
        attachments =     (
        );
        content = nil;
        location = nil;
        tags =     (
        );
        title = nil;
    }),
        <Post: 0x2802c2d50> (entity: Post; id: 0x9aac7c6d193c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p1>; data: {
        attachments =     (
        );
        content = nil;
        location = nil;
        tags =     (
        );
        title = "Untitled 3:40:24 PM";
    })
    )}, AnyHashable("NSObjectsChangedByMergeChangesKey"): {(
    )}]
    ****************  controller(_:didChange:at:for:newIndexPath:) NSFetchedResultsChangeType(rawValue: 2)  <Post: 0x2802d2a80> (entity: Post; id: 0x9aac7c6d195c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p2>; data: {
        attachments =     (
        );
        content = nil;
        location = nil;
        tags =     (
        );
        title = nil;
    })
    ****************  controller(_:didChange:at:for:newIndexPath:) NSFetchedResultsChangeType(rawValue: 2)  <Post: 0x2802c2d50> (entity: Post; id: 0x9aac7c6d193c7772 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/Post/p1>; data: {
        attachments =     (
        );
        content = nil;
        location = nil;
        tags =     (
        );
        title = "Untitled 3:40:24 PM";
    })
    managedObjectContextObjectsDidChange(notification:) [AnyHashable("updated"): {(
        <NSCKRecordZoneMetadata: 0x2802ce9e0> (entity: NSCKRecordZoneMetadata; id: 0x9aac7c6d193c77d2 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/NSCKRecordZoneMetadata/p1>; data: {
        ckOwnerName = "__defaultOwner__";
        ckRecordZoneName = "com.apple.coredata.cloudkit.zone";
        currentChangeToken = "<CKServerChangeToken: 0x2823fcdc0; data=AQAAAAAAAACQf/////////+gT9nZvOBLv7hsIaI3NVdg>";
        database = "0x9aac7c6d193c77e2 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/NSCKDatabaseMetadata/p1>";
        encodedShareData = nil;
        hasRecordZoneNum = 1;
        hasSubscriptionNum = 0;
        lastFetchDate = "2022-06-15 13:55:25 +0000";
        mirroredRelationships = "<relationship fault: 0x2821a3c60 'mirroredRelationships'>";
        needsImport = 0;
        needsRecoveryFromIdentityLoss = 0;
        needsRecoveryFromUserPurge = 0;
        needsRecoveryFromZoneDelete = 0;
        needsShareDelete = 0;
        needsShareUpdate = 0;
        queries = "<relationship fault: 0x2821a2560 'queries'>";
        records =     (
        );
        supportsAtomicChanges = 1;
        supportsFetchChanges = 1;
        supportsRecordSharing = 1;
        supportsZoneSharing = 1;
    })
    )}, AnyHashable("managedObjectContext"): <_PFWeakReference: 0x2821a1900>, AnyHashable("deleted"): {(
        <NSCKRecordMetadata: 0x2802ce850> (entity: NSCKRecordMetadata; id: 0x9aac7c6d193c7762 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/NSCKRecordMetadata/p1>; data: {
        ckRecordName = "3FB952E5-6B30-472E-BC6E-0116FA507B88";
        ckRecordSystemFields = nil;
        ckShare = nil;
        encodedRecord = "{length = 50, bytes = 0x6276786e f7090000 52070000 e0116270 ... 61726368 69000ee0 }";
        entityId = 3;
        entityPK = 1;
        lastExportedTransactionNumber = nil;
        moveReceipts =     (
        );
        needsCloudDelete = 0;
        needsLocalDelete = 0;
        needsUpload = 0;
        pendingExportChangeTypeNumber = nil;
        pendingExportTransactionNumber = nil;
        recordZone = nil;
    }),
        <NSCKRecordMetadata: 0x2802cdcc0> (entity: NSCKRecordMetadata; id: 0x9aac7c6d195c7762 <x-coredata://496D2B54-DDB9-47EF-945A-CC1DBA1E14E8/NSCKRecordMetadata/p2>; data: {
        ckRecordName = "0919480D-16CB-49F9-8351-9471371040AC";
        ckRecordSystemFields = nil;
        ckShare = nil;
        encodedRecord = "{length = 50, bytes = 0x6276786e f7090000 52070000 e0116270 ... 61726368 69000ee0 }";
        entityId = 3;
        entityPK = 2;
        lastExportedTransactionNumber = nil;
        moveReceipts =     (
        );
        needsCloudDelete = 0;
        needsLocalDelete = 0;
        needsUpload = 0;
        pendingExportChangeTypeNumber = nil;
        pendingExportTransactionNumber = nil;
        recordZone = nil;
    })
    )}]
    managedObjectContextObjectsDidChange(notification:) [AnyHashable("managedObjectContext"): <_PFWeakReference: 0x2821a3060>, AnyHashable("invalidatedAll"): <__NSArrayM 0x282f75830>(
    
    )
    ]
    AFTER: []  
    

    This indicates to me:

    • Before NSManagedObjectContext.mergeChanges, the table was correct, i.e. it contained both posts p1 & p2.
    • Merging was done again with both posts.
    • In the viewContext, both posts were deleted (AnyHashable("deleted")).
    • The fetchedResultsController responded by deleting both posts also (NSFetchedResultsChangeType(rawValue: 2)).
    • Eventually it is logged that the fetchedResultsController has no objects, and thus the table is empty.

    As a final check, I out commented in func processPersistentHistory() the code that purges the history, and as expected, the table was displayed correctly, also when I entered another entry in the simulator.

    What are the conclusions?

    • On both persistent stores (simulator & device), and in iCloud, all data were always correct.
    • Merging of remote store changes to a context fails, if the mirroring software does not have enough time to process its entries in the persistent history.
    • How long this takes depends probably on the amount of data that has to be synced. My experience is that some kb take some seconds, but this depends of course on many parameters. But if so, 7 days correspond to some GB to sync, which is rather unusual. In this respect, purging the persistent history after 7 days seems to be a good compromise between memory consumption and correct app operation.

    Further hints to reproduce the tests (this may help others who try the same):

    As suggested, I downloaded Apple's demo app and the core data stack modified by you.
    It did compile for a simulator, but for the device I had to set 3 additional settings in the Signing & Capabilities tab of the target:

    • Set the development team
    • Set the bundle identifier to a reasonable value, e.g. com.<your company>.CoreDataCloudKitDemo.
    • Select the right iCloud container, e.g. iCloud.com.<your company>.CoreDataCloudKitDemo.
    • Additionally I had to ensure that the simulator and the device were logged in to the same iCloud account. Note that for the simulator, one has to re-log in about once a day. Mostly one is reminded to do so, but sometimes not.

    Then, I could run the app on the simulator and the device.
    I verified in the CloudKit Console that in the Private Database, zone com.apple.coredata.cloudkit.zone there are no records of type CD_Post. Since data are not shared, the iCloud Sharing database is not used.