I noticed this very strange behavior, when trying to get the match for the <> tags
let s = "TEST \r\n\r\n<strong>more:</strong>"
let re = try! NSRegularExpression(pattern: "<.*?>")
let matches = re.matches(in: s, range: NSRange(location: 0, length: s.count))
This results only in 1 match (should have been 2 < strong > and </ strong >)
▿ 1 element
- 0 : <NSSimpleRegularExpressionCheckingResult: 0x600003be3ac0>{9, 8}{<NSRegularExpression: 0x600002019080> <.*?> 0x0}
however when i remove the \r\n from the input checked text
let s = "TEST <strong>more:</strong>"
i get the expected 2 matches!!!
▿ 2 elements
- 0 : <NSSimpleRegularExpressionCheckingResult: 0x600002e0ea00>{5, 8}{<NSRegularExpression: 0x6000035faaf0> <.*?> 0x0}
- 1 : <NSSimpleRegularExpressionCheckingResult: 0x600002e0ed40>{18, 9}{<NSRegularExpression: 0x6000035faaf0> <.*?> 0x0}
What is going on?
The problem is due to the way String
encodes the \r\n
as a single Character
:
let s2 = "\r\n"
print(s2.count) // 1
print(s2.utf8.count) // 2
print(s2.utf8.map { String(format: "%02x", $0) }.joined() ) // “0d0a”
In your example there are 31 ASCII characters but each /r/n
is encoded as a single Character
:
let s = "TEST \r\n\r\n<strong>more:</strong>"
print(s.count) // 29
print(s.utf8.count) // 31
The NSRange
you calculate uses the Swift string length to specify a range in the NSString
and is effectively removing the last two characters of the string when calculating the match. This can easily be confirmed by adding a two or more characters to the end of the string and seeing that two matches are returned.
String
has a method for calculating an NSRange
from an Range<String.Index>
and when that is used then your example produces two matches:
let s = "TEST \r\n\r\n<strong>more:</strong>"
let re = try! NSRegularExpression(pattern: "<.*?>")
let range = NSRange(s.startIndex..., in: s)
let matches = re.matches(in: s, range: range)
You should probably move to the new Swift regular expression API rather than use the older bridged NSString
and NSRegularExpression
.