I've been using NSLinguisticTagger
with sentences and have been encountering a strange issue with sentences such as 'I am hungry' or 'I am drunk'. Whilst one would expect 'I' to be tagged as a pronoun, 'am' as a verb and 'hungry' as an adjective, they are not. Rather they are all tagged as OtherWord
.
Is there something I'm doing incorrectly?
NSString *input = @"I am hungry";
NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace;
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:[NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options];
tagger.string = input;
[tagger enumerateTagsInRange:NSMakeRange(0, input.length) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
NSString *token = [input substringWithRange:tokenRange];
NSString *lemma = [tagger tagAtIndex:tokenRange.location
scheme:NSLinguisticTagSchemeLemma
tokenRange: NULL
sentenceRange:NULL];
NSLog(@"%@ (%@) : %@\n", token, lemma, tag);
}];
And the output is:
I ((null)) : OtherWord
am ((null)) : OtherWord
hungry ((null)) : OtherWord
After quite some time in chat we found the issue:
The sentence does not contain enough information to determine its language.
To fix this you can either:
add a demo sentence in your language of choice after your actual sentence. That should guarantee your preferred language gets detected.
OR
Tell the tagger what language to use: add the line
[tagger setOrthography:[NSOrthography orthographyWithDominantScript:@"Latn" languageMap:@{@"Latn" : @[@"en"]}] range:NSMakeRange(0, input.length)];
before the enumerate
call. That way you explicitly tell the tagger what language you want the text to be in, in this case englisch (en
) as part of the latin dominant language (Latn
).
If you dont know the language for sure, it may be usefull to use either of theses methods only as a fallback if the words get tagged as OtherWord
meaning the language could not be detected.