I need to decompose an Arabic word into its consonants and vowels. For instance, "ضَرَبَ" has three consonants and three vowels and therefore I would like its length to be 6 instead of 3. However:
let t = "ضَرَبَ"
let ud = t.decomposedStringWithCanonicalMapping
print("ud Length = \(ud.count)")
I get 3 instead of 6... How to decompose this string into the following array:
"\u{0636}\u{064e}\u{0631}\u{064e}\u{0628}\u{064e}"
Your goal here is to consider Unicode code points rather than a collection of Swift Character
(i.e. extended grapheme clusters), after applying normalization. You can do that with .unicodeScalars
:
print("ud Length = \(ud.unicodeScalars.count)") // ud Length = 6
^^^^^^^^^^^^^^
Keep in mind that this is not just "consonants and vowels." Things like shaddah and nunation will also be code points after normalization (I assume that's a benefit for your use case; just something to keep in mind).
Your question about "decompose this string into the following array" is somewhat misguided. The example you give is a String, not an Array. But importantly, it is the same String as t
. (Check it with ==
.) If you want an Array of UnicodeScalars, however, that would be Array(ud.unicodeScalars)
.