Search code examples
iosunicodensstringunicode-string

NSString to treat "regular english alphabets" and characters like emoji or japanese uniformly


There is a textView in which I can enter Characters. characters can be a,b,c,d etc or a smiley face added using emoji keyboard.

-(void)textFieldDidEndEditing:(UITextField *)textField{
    NSLog(@"len:%lu",textField.length);
    NSLog(@"char:%c",[textField.text characterAtIndex:0]);
}

Currently , The above function gives following outputs

if textField.text = @"qq"
len:2
char:q

if textField.text = @"😄q"
len:3
char:=

What I need is

if textField.text = @"qq"
len:2
char:q

if textField.text = @"😄q"
len:2
char:😄

Any clue how to do this ?


Solution

  • Since Apple screwed up emoji (actually Unicode planes above 0) this becomes difficult. It seems it is necessary to enumerate through the composed character to get the actual length.

    Note: The NSString method length does not return the number of characters but the number of code units (not characters) in unichars. See NSString and Unicode - Strings - objc.io issue #9.

    Example code:

    NSString *text = @"qqq😄rrr";
    int maxCharacters = 4;
    
    __block NSInteger unicharCount = 0;
    __block NSInteger charCount = 0;
    [text enumerateSubstringsInRange:NSMakeRange(0, text.length)
                             options:NSStringEnumerationByComposedCharacterSequences
                          usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
                              unicharCount += substringRange.length;
                              if (++charCount >= maxCharacters)
                                  *stop = YES;
                          }];
    NSString *textStart = [text substringToIndex: unicharCount];
    NSLog(@"textStart: '%@'", textStart);
    

    textStart: 'qqq😄'

    An alternative approach is to use utf32 encoding:

    int byteCount = maxCharacters*4; // 4 utf32 characters
    char buffer[byteCount];
    NSUInteger usedBufferCount;
    [text getBytes:buffer maxLength:byteCount usedLength:&usedBufferCount encoding:NSUTF32StringEncoding options:0 range:NSMakeRange(0, text.length) remainingRange:NULL];
    NSString * textStart = [[NSString alloc] initWithBytes:buffer length:usedBufferCount encoding:NSUTF32LittleEndianStringEncoding];
    

    There is some rational for this in Session 128 - Advance Text Processing from 2011 WWDC.