Search code examples
iosregexxcodetaggingnsregularexpression

iOS regex matching non-whitespace characters up to first non-word character after @ symbol


I am having a hard time getting my regular expression to work. I am writing an app that does tagging in the comments section, so everytime there is an @ symbol I need to look at the following text to make a link out of it. I.e. what Instagram and Twitter do.

Below is my regular expression, i need to get all occurrances of the regular expression that fits these criteria: @ followed by any alphanumeric character and end when it gets to a space or another @ symbol.

 NSString *searchedString = cellComment.commentText;
 NSRange   searchedRange = NSMakeRange(0, [searchedString length]);
 NSString *pattern = @"@.+[^\s]";
 NSError  *error = nil;

 NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern: pattern options:0 error:&error];
 NSArray* matches = [regex matchesInString:searchedString options:0 range: searchedRange];
 for (NSTextCheckingResult* match in matches) {
      NSString* matchText = [searchedString substringWithRange:[match range]];
      for(int i = 0; i< match.numberOfRanges;i++)
      {
           NSRange group1 = [match rangeAtIndex:i];
           NSLog(@"group1: %@%lu", [searchedString substringWithRange:group1],group1.location);
      }

 }

Solution

  • You had [^\s] in your initial post version and the issue is that the backslashes must be doubled in Objective-C regex patterns. Also, .+ matches one or more characters other than a newline, which is not what you need.

    You can use

    NSString *pattern = @"\\B@\\w\\S*\\b";
    

    See regex demo, it matches

    • \B - a non-word boundary (there must be no word character before @, remove if you need to match in such contexts)
    • @ - a literal @
    • \w - an alphanumeric character (use \p{L} if the first one should be a letter, or [\p{L}\d] if you want to allow a letter or digit in the initial position)
    • \S* - zero or more non-whitespace characters up to...
    • \b - a word boundary.

    Note that Twitter usernames follow this pattern:

    NSString *pattern = @"@\\w+";
    

    The \w+ matches any alphanumeric characters (one or more occurrences).

    See the IDEONE demo of your code