I noticed something when using NSDataDetector to pull times out of text and I'm not sure I understand what's going on. In my situation the only info I have is the time - no extra day/month/year date information. Just the time portions of a date - like "11:30" that might be embedded in a string of text.
Sample function to extract date info from a string:
-(NSString*)extractTime:(NSString*)value {
NSError *error = NULL;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:(NSTextCheckingTypes)NSTextCheckingTypeDate error:&error];
NSArray *matches = [detector matchesInString:value options:0 range:NSMakeRange(0, [value length])];
NSDate *dateValue;
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeDate) {
dateValue = [match date];
}
}
NSDateFormatter *formatter = [[NSDateFormatter alloc] init];
[formatter setDateFormat:@"HH:mm"];
NSString *time = [formatter stringFromDate:dateValue];
NSLog(@"original:%@ got_date:%@ formatted_time:%@", value, dateValue, time);
return time;
}
I then have a simple test function to throw some time strings at the detector.
-(void)testTimeExtraction {
NSArray<NSString*>* times = @[@"07:30", @"8:30", @"9:30", @"10:30", @"11:30"];
for(NSString *time in times) {
NSLog(@"%@", [self extractTime:time]);
}
}
What I'd expect is time info for 7:30, 8:30, 9:30, etc. Or, if failing that, at least somewhat consistent times (in the same timezone).
But - what I get varies based on my system clock. And I'm not understanding why or what to do about it. My guess is that in the absence of a date portion of the detected date, the date is being set to the current UTC date, but what I don't understand is why the date/time shifts the result as it does. The dates seem to be shifted based on system clock time, but not all of them.
If I set my system clock time to: 06:01AM, 07:01AM, or 08:01AM (all the same results) These look "right" as the times seem to be inferred consistently.
System clock time: 09:01AM US Central The 8:30 date is shifted (but not 7:30, 9:30, 10:30, or 11:30)
System clock time: 10:01AM US Central (and now 8:30, and 9:30 are shifted, but not the others)
System clock time: 11:01AM US Central (and so on...)
My assumption is I'm just not understanding something fundamental with date handling and the date extraction, but it seems really weird to me that only a subset of the detected dates shift - based on system clock time.
Any clue on why this is happening would be most appreciated.
These are heuristics... Data Detectors tries to guess what the most probable date is. If you scan "8:30" at 9:01 then Data Detectors assumes it makes more sense if it refers to 8:30 PM (in the future), instead of 8:30 AM (in the past). This is why the formatted time you get is 20:30 (8:30 PM).
If you look at your various tests, you'll see that the dates assumed to be PM are always the ones that would be in the past relative to the current date if they had been detected as AM.
You should not assume that this is what is always going to happen either. This behavior is locale-specific.