Why does stringr::str_extract always return NA for a certain character vector

I've been trying to use str_extract to extract dates from data I've scraped off the website of the world trade organization. The problem is that for whatever reason, it's always returning NA. However when I type in the strings myself, the function suddenly works. Any ideas as to what is going on?

> country_comparison$status[1:10]
 [1] "Settled or terminated (withdrawn, mutually agreed solution) on 29 March 1995" "Implementation notified by respondent on 25 September 1997"                  
 [3] "In consultations on 4 April 1995"                                             "Implementation notified by respondent on 25 September 1997"                  
 [5] "Settled or terminated (withdrawn, mutually agreed solution) on 20 July 1995"  "Settled or terminated (withdrawn, mutually agreed solution) on 19 July 1995" 
 [7] "Settled or terminated (withdrawn, mutually agreed solution) on 5 July 1996"   "Mutually acceptable solution on implementation notified on 9 January 1998"   
 [9] "Panel established, but not yet composed on 11 October 1995"                   "Mutually acceptable solution on implementation notified on 9 January 1998"   

> country_comparison$status[1:10] %>% str_extract(pattern = "[0-9]{1,2} [A-Za-z]+ [0-9]{4}")
 [1] NA NA NA NA NA NA NA NA NA NA

> c("Settled or terminated (withdrawn, mutually agreed solution) on 29 March 1995", "Implementation notified by respondent on 25 September 1997") %>% str_extract(pattern = "[0-9]{1,2} [A-Za-z]+ [0-9]{4}")
[1] "29 March 1995"     "25 September 1997"

Solution

Kind of a guess, but if those strings are scraped from www.wto.org and the first one origins from https://www.wto.org/english/tratop_e/dispu_e/cases_e/ds1_e.htm , then depending on how those were collected, there might be few non-breaking spaces:

<span class="paraboldcolourtext">
    Settled or terminated (withdrawn, mutually agreed solution)
</span> on <b>29&nbsp;March&nbsp;1995</b>

Try replacing " " (space) in regex with \\s to match any whitespace:

library(stringr)
s <- "Settled or terminated (withdrawn, mutually agreed solution) on 29\u00A0March\u00A01995"
# looks like a regular space:
s
#> [1] "Settled or terminated (withdrawn, mutually agreed solution) on 29 March 1995"

# until you check it with something that can highlight unusual whitespace:
stringr::str_view(s)
#> [1] │ Settled or terminated (withdrawn, mutually agreed solution) on 29{\u00a0}March{\u00a0}1995

# replacing " " in regex with \\s:
str_view(s,"[0-9]{1,2}\\s[A-Za-z]+\\s[0-9]{4}")
#> [1] │ Settled or terminated (withdrawn, mutually agreed solution) on <29{\u00a0}March{\u00a0}1995>
str_extract(s,"[0-9]{1,2}\\s[A-Za-z]+\\s[0-9]{4}")
#> [1] "29 March 1995"

^{Created on 2023-09-23 with reprex v2.0.2}