I'm trying to implement a simple binary search on an array using indexOfObject:inSortedRange:options:usingComparator:
and the behavior of this method is not quite what I expect and I have no idea what is missed.
Let's dive into details:
City
with has a readableName
property of type NSString
on it. A sample value of readableName is like Alabama, US
.readableName
property.Ams
is searched, I can filter the cities that start with "Ams" like Amsterdam, Amstelveen, etc. + (FilterRange *)findRangeHeadAndTailForPrefix:(NSString *)prefix inCityArray:(NSArray *)array {
FilterRange *result = [[FilterRange alloc] init];
result.startIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:YES];
result.endIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:NO];
return result;
}
+ (long)findRangeBordersForPrefix:(NSString *)prefix inArray:(NSArray *)array lookingForHead:(BOOL)shouldLookForHead {
NSRange searchRange = NSMakeRange(0, [array count]);
long foundIndex = [array indexOfObject:prefix
inSortedRange:searchRange
options:(shouldLookForHead ? NSBinarySearchingFirstEqual : NSBinarySearchingLastEqual)
usingComparator:^(id obj1, id obj2)
{
City *castedCity = (City *)([obj1 isKindOfClass:[City class]] ? obj1 : obj2);
NSString *castedPrefix = (NSString *)([obj1 isKindOfClass:[City class]] ? obj2 : obj1);
NSComparisonResult comparisonResult = ([[[castedCity readableName] lowercaseString] hasPrefix:[castedPrefix lowercaseString]] ? NSOrderedSame :
[[[castedCity readableName] lowercaseString] compare:[castedPrefix lowercaseString]]);
return comparisonResult;
}];
return foundIndex;
}
The problem is the behavior of indexOfObject:inSortedRange:options:usingComparator:
method and here's how it behaves (Saw this using breakpoints and step-by-step execution of the comparator):
NSOrderedDescending
NSOrderedAscending
long
number value.So the search is never properly performed.
Please kindly note that I don't want to using filterUsingPredicate
because of its time complexity. The array is already sorted, so a much better efficiency level can be achieved via binary search.
Does anyone have any idea that what I might have been missed. I guess there's something really obvious and I'm not paying attention to it. Any help or idea is really appreciated :)
First problem I see is that you're using lowercase
string which doesn't work well for accented characters, ... As a first thing, let's write some helper to normalize a string.
@interface NSString(Normalize)
- (NSString *)normalized;
@end
@implementation NSString(Normalize)
- (NSString *)normalized {
NSMutableString *result = [NSMutableString stringWithString:self];
CFStringTransform((__bridge CFMutableStringRef)result, NULL, kCFStringTransformStripCombiningMarks, NO);
return [result lowercaseString];
}
@end
This method returns lowercased string with stripped combining marks. Not a very performant version, but you have an idea what needs to be done here.
Normalization can be expensive, let's cache it.
@interface City: NSObject
@property(nonatomic, strong) NSString *readableName;
@property(nonatomic, strong, readonly) NSString *normalizedReadableName;
@end
@implementation City {
NSString *_normalizedReadableName;
}
- (instancetype)initWithName:(NSString *)name {
if ((self = [super init]) == nil) { return nil; }
_readableName = name;
_normalizedReadableName = nil;
return self;
}
- (NSString *)normalizedReadableName {
if (_normalizedReadableName == nil) {
_normalizedReadableName = [_readableName normalized];
}
return _normalizedReadableName;
}
- (void)setReadableName:(NSString *)readableName {
_readableName = readableName;
_normalizedReadableName = nil;
}
+(instancetype)cityWithName:(NSString *)name {
return [[self alloc] initWithName:name];
}
@end
Again, it's up to you how you'd like to proceed here. Take it as an example.
indexOfObject:inSortedRange:options:usingComparator:
says:
The elements in the array must have already been sorted using the comparator
cmp
(it's theusingComparator
argument). If the array is not sorted, the result is undefined.
You wrote:
The array is already sorted alphabetically based on the
readableName
property.
But in your comparator, you're using lowercaseString
. It's unclear if it's sorted by lowercased string or not, can be another issue. Otherwise the result is undefined. We have to operate on the same string (sort, compare, hasPrefix, ...) - this is the reason for the normalization dance.
Let's create a sample array, shuffle it and sort it.
NSArray *shuffledCities = [@[
[City cityWithName:@"Čáslav"],
[City cityWithName:@"Čelákovice"],
[City cityWithName:@"Černošice"],
[City cityWithName:@"Černošín"],
[City cityWithName:@"Černovice"],
[City cityWithName:@"Červená Řečice"],
[City cityWithName:@"Červený Kostelec"],
[City cityWithName:@"Česká Kamenice"],
[City cityWithName:@"Česká Lípa"],
[City cityWithName:@"Česká Skalice"],
[City cityWithName:@"Česká Třebová"],
[City cityWithName:@"České Budějovice"],
[City cityWithName:@"České Velenice"],
[City cityWithName:@"Český Brod"],
[City cityWithName:@"Český Dub"],
[City cityWithName:@"Český Krumlov"],
[City cityWithName:@"Český Těšín"],
[City cityWithName:@"Chodová Planá"]
] shuffledArray]; // It's from the GameplayKit.framework
NSArray *sortedCities = [shuffledCities sortedArrayUsingComparator:^NSComparisonResult(City *_Nonnull city1, City *_Nonnull city2) {
return [city1.normalizedReadableName compare:city2.normalizedReadableName];
}];
Important point here is that we're sorting by normalizedReadableName
property.
Let's pretend that the prefix
is an argument from your function - we have to normalize it as well ...
NSString *prefix = @"čEsKÝ dub";
NSString *normalizedPrefix = [prefix normalized];
... otherwise our comparator won't work:
NSComparisonResult (^comparator)(id _Nonnull, id _Nonnull) = ^(id _Nonnull obj1, id _Nonnull obj2) {
// One has to be City and another one NSString
assert([obj1 isKindOfClass:[NSString class]] || [obj2 isKindOfClass:[NSString class]]);
assert([obj1 isKindOfClass:[City class]] || [obj2 isKindOfClass:[City class]]);
if ([obj1 isKindOfClass:[City class]]) {
return [[obj1 normalizedReadableName] hasPrefix:obj2] ? NSOrderedSame : [[obj1 normalizedReadableName] compare:obj2];
} else {
return [[obj2 normalizedReadableName] hasPrefix:obj1] ? NSOrderedSame : [obj1 compare:[obj2 normalizedReadableName]];
}
};
Another issue I see is that your comparator is wrong if obj2
is City
. Comparator expects comparison of [obj1 compare:obj2]
, but in this case your comparator is returning [obj2 compare:obj1]
(obj2
is City
, obj1
is NSString
).
We have fixed the comparator, let's search for the first city:
NSUInteger first = [sortedCities indexOfObject:normalizedPrefix
inSortedRange:NSMakeRange(0, sortedCities.count)
options:NSBinarySearchingFirstEqual
usingComparator:comparator];
if (first == NSNotFound) {
NSLog(@"Prefix \"%@\" not found", prefix);
return;
}
If found, search for the 2nd one:
NSUInteger last = [sortedCities indexOfObject:normalizedPrefix
inSortedRange:NSMakeRange(first, sortedCities.count - first)
options:NSBinarySearchingLastEqual
usingComparator:comparator];
// Shouldn't happen as our search range includes the first one
assert(last != NSNotFound);
NSLog(@"Prefix \"%@\" found", prefix);
NSLog(@" - First %lu: \"%@\"", (unsigned long)first, [sortedCities[first] readableName]);
NSLog(@" - Last %lu: \"%@\"", (unsigned long)last, [sortedCities[last] readableName]);
All of them are correct.
Prefix "čEsKÝ dub" found
- First 14: "Český Dub"
- Last 14: "Český Dub"
Prefix "Praha" not found
Prefix "ceskÝ" found
- First 13: "Český Brod"
- Last 16: "Český Těšín"
Prefix "cernos" found
- First 2: "Černošice"
- Last 3: "Černošín"