Search code examples
core-datautf-8utf-16corestore

UTF16 stored string doesn't match once retrieved back from CoreData


So I am using CoreStore to save a string identifier in CoreData. The string may have some Swedish UTF16 characters. Inspecting from the debugger console:

> po identifier
"/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder åäö/Test with Swedish characters - åäö.xlsx"

Immediately after being saved back to CoreData:

>po record
<File: 0x281e140a0> (entity: File; id: 0xdcac6620f1e9eb63 <x-coredata://BA0168AF-92CE-4AC2-A934-1020E41C5C20/File/p615>; data: {
    // ...
    identifier = "[email protected]@files.runcloud.se/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder \U00e5\U00e4\U00f6/Test with Swedish characters - \U00e5\U00e4\U00f6.xlsx";
    // ...
})

Which looks like the UTF16 string has been stored as an UTF8 one. But still a valid one as:

> po record.identifier == identifier
true

The problem comes later when trying to retrieve the record with again a UTF16 Swedish identifier string as the original above as it doesn't match anymore.

CoreStore.fetchOne(From<Record>().where(\.identifier == identifier)) // Fails

How could I convert identifier to a representation that would match the stored CoreData value?

Update

Even more strange, a hardcoded identifier does succeed:

CoreStore.fetchOne(From<Record>().where(\.identifier == "[email protected]@files.runcloud.se/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder åäö/Test with Swedish characters - åäö.xlsx")) // Works

And identifer and this hardcoded string do match:

po identifier == "[email protected]@files.runcloud.se/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder åäö/Test with Swedish characters - åäö.xlsx"
true

But using identifier instead of the hardcoded one doesn't.

Update 2

Comparing .unicodeScalars of identifier and the hardcoded string does show that they are indeed different:

enter image description here


Solution

  • CoreData does save and return strings exactly the same.

    The issue at trying to retrieve values using complex characters is that CoreData (and most probably SQLite behind it) do not consider my sentences equal as they have different grapheme clusters. Both sentences are valid and compare equal in Swift but not in CoreData as values to retrieve objects.

    There doesn't seem to be a proper way to convert grapheme clusters in Swift, so my workaround was to recreate the process that lead to have the original grapheme clusters in the first place. This involved first creating a URL out of the string and then letting the FileProvider framework create the same grapheme clusters by calling persistentIdentifierForItem(at: url)!.rawValue. Then use this value to retrieve my saved object.