I want to count years found between the opening and closing brackets in the following text named txt
.
library(stringr)
txt <- "Text Mining exercise (2020) Mining, p. 628508; Computer Science text analysis (1998) Computer Science, p.345-355; Introduction to data mining (2015) J. Data Science, pp. 31-33"
lengths(strsplit(txt,"\\(\\d{4}\\)"))
gives me 4
which is wrong. Any help, please?
I think you are looking for stringr::str_count()
:
str_count(txt, "\\([0-9]{4}\\)")
[1] 3
To include only number of four digits within parentheses that also start with 1 or 2 followed by either 0 or 9:
str_count(txt, "\\([1-2][0|9][0-9]{2}\\)")
Strictly starting with either 19 or 20:
str_count(txt, "\\(19[0-9]{2}\\)|\\(20[0-9]{2}\\)")
# In R 4.0
str_count(txt, r"(\(19[0-9]{2}\)|\(20[0-9]{2}\))")