Search code examples
swiftcocoansstringnsrange

NSString.rangeOfString returns unusual result with non-latin characters


I need to get the range of two words in a string, for example:

ยัฟิแก ไฟหก

(this is literally me typing PYABCD WASD) - it's a non-sensical test since I don't speak Thai.

//Find all the ranges of each word
var words:  [String]    = []
var ranges: [NSRange]   = []

//Convert to nsstring first because otherwise you get stuck with Ranges and Strings.
let nstext = backgroundTextField.stringValue as NSString //contains "ยัฟิแก ไฟหก"
words  = nstext.componentsSeparatedByString(" ")
var nstextLessWordsWeHaveRangesFor = nstext //if you have two identical words this prevents just getting the first word's range

for word in words
        {

            let range:NSRange = nstextLessWordsWeHaveRangesFor.rangeOfString(word)
            Swift.print(range)
            ranges.append(range)

            //create a string the same length as word
            var fillerString:String = ""

            for i in 0..<word.characters.count{
            //for var i=0;i<word.characters.count;i += 1{
                Swift.print("i: \(i)")
               fillerString = fillerString.stringByAppendingString(" ")
            }

            //remove duplicate words / letters so that we get correct range each time. 
            if range.length <= nstextLessWordsWeHaveRangesFor.length
            {
                nstextLessWordsWeHaveRangesFor = nstextLessWordsWeHaveRangesFor.stringByReplacingCharactersInRange(range, withString: fillerString)
            }             
        }

outputs:

(0,6)
(5,4)

Those ranges are overlapping.

This causes problems down the road where I'm trying to use NSLayoutManager.enumerateEnclosingRectsForGlyphRange since the ranges are inconsistent.

How can I get the correct range (or in this specific case, non-overlapping ranges)?


Solution

  • Swift String characters describe "extended grapheme clusters", and NSString uses UTF-16 code points, therefore the length of a string differs depending on which representation you use.

    For example, the first character "ยั" is actually the combination of "ย" (U+0E22) with the diacritical mark " ั" (U+0E31). That counts as one String character, but as two NSString characters. As a consequence, indices change when you replace the word with spaces.

    The simplest solution is to stick to one, either String or NSString (if possible). Since you are working with NSString, changing

     for i in 0..<word.characters.count {
    

    to

    for i in 0..<range.length {
    

    should solve the problem. The creation of the filler string can be simplified to

    //create a string the same length as word
    let fillerString = String(count: range.length, repeatedValue: Character(" "))