Search code examples
iosarraysobjective-cfilterbinary-search

Filter NSArray based on attribute prefix using indexOfObject:inSortedRange:options:usingComparator: binary search


I'm trying to implement a simple binary search on an array using indexOfObject:inSortedRange:options:usingComparator: and the behavior of this method is not quite what I expect and I have no idea what is missed.

Let's dive into details:

  • The array I'm trying to find items with is a custom object named City with has a readableName property of type NSString on it. A sample value of readableName is like Alabama, US.
  • The array is already sorted alphabetically based on the readableName property.
  • What I'm trying to achieve using this binary search is to be able to filter the array based on a prefix that is searched, for example, if Ams is searched, I can filter the cities that start with "Ams" like Amsterdam, Amstelveen, etc.
  • Since the array is sorted I'm trying to find the first item that has this prefix and the last item that has this prefix and create a subarray from the items in between.
  • The followings are the methods using which I have implemented the search for the head and the tail of the range that contains the cities with a given prefix:
 + (FilterRange *)findRangeHeadAndTailForPrefix:(NSString *)prefix inCityArray:(NSArray *)array {
    FilterRange *result = [[FilterRange alloc] init];
    result.startIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:YES];
    result.endIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:NO];
    return result;
}


 + (long)findRangeBordersForPrefix:(NSString *)prefix inArray:(NSArray *)array lookingForHead:(BOOL)shouldLookForHead {
    NSRange searchRange = NSMakeRange(0, [array count]);
    long foundIndex = [array indexOfObject:prefix
                             inSortedRange:searchRange
                                   options:(shouldLookForHead ? NSBinarySearchingFirstEqual : NSBinarySearchingLastEqual)
                           usingComparator:^(id obj1, id obj2)
                       {
        City *castedCity = (City *)([obj1 isKindOfClass:[City class]] ? obj1 : obj2);
        NSString *castedPrefix = (NSString *)([obj1 isKindOfClass:[City class]] ? obj2 : obj1);
        NSComparisonResult comparisonResult = ([[[castedCity readableName] lowercaseString] hasPrefix:[castedPrefix lowercaseString]] ? NSOrderedSame :
                                               [[[castedCity readableName] lowercaseString] compare:[castedPrefix lowercaseString]]);
        return comparisonResult;
    }];
    
    return foundIndex;
}

The problem is the behavior of indexOfObject:inSortedRange:options:usingComparator: method and here's how it behaves (Saw this using breakpoints and step-by-step execution of the comparator):

  • For the startIndex, the comparator is called with the last object in the array and the prefix and the result is NSOrderedDescending
  • Then the comparator is called with the first object in the array and the prefix and the result is NSOrderedAscending
  • Then it immediately stops searching and comparing other array items and returns the maximum long number value.
  • The same happens for the endIndex

So the search is never properly performed. Please kindly note that I don't want to using filterUsingPredicate because of its time complexity. The array is already sorted, so a much better efficiency level can be achieved via binary search.

Does anyone have any idea that what I might have been missed. I guess there's something really obvious and I'm not paying attention to it. Any help or idea is really appreciated :)


Solution

  • Normalization

    First problem I see is that you're using lowercase string which doesn't work well for accented characters, ... As a first thing, let's write some helper to normalize a string.

    @interface NSString(Normalize)
    
    - (NSString *)normalized;
    
    @end
    
    @implementation NSString(Normalize)
    
    - (NSString *)normalized {
        NSMutableString *result = [NSMutableString stringWithString:self];
        CFStringTransform((__bridge CFMutableStringRef)result, NULL, kCFStringTransformStripCombiningMarks, NO);
        return [result lowercaseString];
    }
    
    @end
    

    This method returns lowercased string with stripped combining marks. Not a very performant version, but you have an idea what needs to be done here.

    Caching

    Normalization can be expensive, let's cache it.

    @interface City: NSObject
    
    @property(nonatomic, strong) NSString *readableName;
    @property(nonatomic, strong, readonly) NSString *normalizedReadableName;
    
    @end
    
    @implementation City {
        NSString *_normalizedReadableName;
    }
    
    - (instancetype)initWithName:(NSString *)name {
        if ((self = [super init]) == nil) { return nil; }
        _readableName = name;
        _normalizedReadableName = nil;    
        return self;
    }
    
    - (NSString *)normalizedReadableName {
        if (_normalizedReadableName == nil) {
            _normalizedReadableName = [_readableName normalized];
        }
        return _normalizedReadableName;
    }
    
    - (void)setReadableName:(NSString *)readableName {
        _readableName = readableName;
        _normalizedReadableName = nil;
    }
    
    +(instancetype)cityWithName:(NSString *)name {
        return [[self alloc] initWithName:name];
    }
    
    @end
    

    Again, it's up to you how you'd like to proceed here. Take it as an example.

    Search

    indexOfObject:inSortedRange:options:usingComparator: says:

    The elements in the array must have already been sorted using the comparator cmp (it's the usingComparator argument). If the array is not sorted, the result is undefined.

    You wrote:

    The array is already sorted alphabetically based on the readableName property.

    But in your comparator, you're using lowercaseString. It's unclear if it's sorted by lowercased string or not, can be another issue. Otherwise the result is undefined. We have to operate on the same string (sort, compare, hasPrefix, ...) - this is the reason for the normalization dance.

    Let's create a sample array, shuffle it and sort it.

    NSArray *shuffledCities = [@[
        [City cityWithName:@"Čáslav"],
        [City cityWithName:@"Čelákovice"],
        [City cityWithName:@"Černošice"],
        [City cityWithName:@"Černošín"],
        [City cityWithName:@"Černovice"],
        [City cityWithName:@"Červená Řečice"],
        [City cityWithName:@"Červený Kostelec"],
        [City cityWithName:@"Česká Kamenice"],
        [City cityWithName:@"Česká Lípa"],
        [City cityWithName:@"Česká Skalice"],
        [City cityWithName:@"Česká Třebová"],
        [City cityWithName:@"České Budějovice"],
        [City cityWithName:@"České Velenice"],
        [City cityWithName:@"Český Brod"],
        [City cityWithName:@"Český Dub"],
        [City cityWithName:@"Český Krumlov"],
        [City cityWithName:@"Český Těšín"],
        [City cityWithName:@"Chodová Planá"]
    ] shuffledArray]; // It's from the GameplayKit.framework
    
    NSArray *sortedCities = [shuffledCities sortedArrayUsingComparator:^NSComparisonResult(City *_Nonnull city1, City *_Nonnull city2) {
        return [city1.normalizedReadableName compare:city2.normalizedReadableName];
    }];
    

    Important point here is that we're sorting by normalizedReadableName property.

    Let's pretend that the prefix is an argument from your function - we have to normalize it as well ...

    NSString *prefix = @"čEsKÝ dub";
    NSString *normalizedPrefix = [prefix normalized];
    

    ... otherwise our comparator won't work:

    NSComparisonResult (^comparator)(id  _Nonnull, id  _Nonnull) = ^(id _Nonnull obj1, id  _Nonnull obj2) {
        // One has to be City and another one NSString
        assert([obj1 isKindOfClass:[NSString class]] || [obj2 isKindOfClass:[NSString class]]);
        assert([obj1 isKindOfClass:[City class]] || [obj2 isKindOfClass:[City class]]);
    
        
        if ([obj1 isKindOfClass:[City class]]) {
            return [[obj1 normalizedReadableName] hasPrefix:obj2] ? NSOrderedSame : [[obj1 normalizedReadableName] compare:obj2];
        } else {
            return [[obj2 normalizedReadableName] hasPrefix:obj1] ? NSOrderedSame : [obj1 compare:[obj2 normalizedReadableName]];
        }
    };
    

    Another issue I see is that your comparator is wrong if obj2 is City. Comparator expects comparison of [obj1 compare:obj2], but in this case your comparator is returning [obj2 compare:obj1] (obj2 is City, obj1 is NSString).

    We have fixed the comparator, let's search for the first city:

    NSUInteger first = [sortedCities indexOfObject:normalizedPrefix
                                     inSortedRange:NSMakeRange(0, sortedCities.count)
                                           options:NSBinarySearchingFirstEqual
                                   usingComparator:comparator];
    
    if (first == NSNotFound) {
        NSLog(@"Prefix \"%@\" not found", prefix);
        return;
    }
    

    If found, search for the 2nd one:

    NSUInteger last = [sortedCities indexOfObject:normalizedPrefix
                                    inSortedRange:NSMakeRange(first, sortedCities.count - first)
                                          options:NSBinarySearchingLastEqual
                                  usingComparator:comparator];
    
    // Shouldn't happen as our search range includes the first one
    assert(last != NSNotFound);
    
    NSLog(@"Prefix \"%@\" found", prefix);
    NSLog(@" - First %lu: \"%@\"", (unsigned long)first, [sortedCities[first] readableName]);
    NSLog(@" - Last %lu: \"%@\"", (unsigned long)last, [sortedCities[last] readableName]);
    

    Sample outputs

    All of them are correct.

    Prefix "čEsKÝ dub" found
     - First 14: "Český Dub"
     - Last 14: "Český Dub"
    
    Prefix "Praha" not found
    
    Prefix "ceskÝ" found
     - First 13: "Český Brod"
     - Last 16: "Český Těšín"
    
    Prefix "cernos" found
     - First 2: "Černošice"
     - Last 3: "Černošín"