Search code examples
swiftstringnsregularexpression

NSRegularExpression doesn't work when string contains \r\n


I noticed this very strange behavior, when trying to get the match for the <> tags

let s = "TEST \r\n\r\n<strong>more:</strong>"
let re = try! NSRegularExpression(pattern: "<.*?>")
let matches = re.matches(in: s, range: NSRange(location: 0, length: s.count))

This results only in 1 match (should have been 2 < strong > and </ strong >)

▿ 1 element
  - 0 : <NSSimpleRegularExpressionCheckingResult: 0x600003be3ac0>{9, 8}{<NSRegularExpression: 0x600002019080> <.*?> 0x0}

however when i remove the \r\n from the input checked text

let s = "TEST <strong>more:</strong>"

i get the expected 2 matches!!!

▿ 2 elements
  - 0 : <NSSimpleRegularExpressionCheckingResult: 0x600002e0ea00>{5, 8}{<NSRegularExpression: 0x6000035faaf0> <.*?> 0x0}
  - 1 : <NSSimpleRegularExpressionCheckingResult: 0x600002e0ed40>{18, 9}{<NSRegularExpression: 0x6000035faaf0> <.*?> 0x0}

What is going on?


Solution

  • The problem is due to the way String encodes the \r\n as a single Character:

    let s2 = "\r\n"
    print(s2.count)       // 1
    print(s2.utf8.count)  // 2 
    
    print(s2.utf8.map { String(format: "%02x", $0) }.joined() )   // “0d0a”
    

    In your example there are 31 ASCII characters but each /r/n is encoded as a single Character:

    let s = "TEST \r\n\r\n<strong>more:</strong>"
    print(s.count)      // 29
    print(s.utf8.count) // 31
    

    The NSRange you calculate uses the Swift string length to specify a range in the NSString and is effectively removing the last two characters of the string when calculating the match. This can easily be confirmed by adding a two or more characters to the end of the string and seeing that two matches are returned.

    String has a method for calculating an NSRange from an Range<String.Index> and when that is used then your example produces two matches:

    let s = "TEST \r\n\r\n<strong>more:</strong>"
    let re = try! NSRegularExpression(pattern: "<.*?>")
    let range = NSRange(s.startIndex..., in: s)
    let matches = re.matches(in: s, range: range)
    

    You should probably move to the new Swift regular expression API rather than use the older bridged NSString and NSRegularExpression.