In my data (which is text), there are abbreviations.
Is there any functions or code that search for abbreviations in text? For example, detecting 3-4-5 capital letter abbreviations and letting me count how often they happen.
Much appreciated!
detecting 3-4-5 capital letter abbreviations
You may use
\b[A-Z]{3,5}\b
See the regex demo
Details:
\b
- a word boundary[A-Z]{3,5}
- 3, 4 or 5 capital letters (use [[:upper:]]
to match letters other than ASCII, too) \b
- a word boundary. R demo online (leveraging the regex occurrence count code from @TheComeOnMan)
abbrev_regex <- "\\b[A-Z]{3,5}\\b";
x <- "XYZ was seen at WXYZ with VWXYZ and did ABCDEFGH."
sum(gregexpr(abbrev_regex,x)[[1]] > 0)
## => [1] 3
regmatches(x, gregexpr(abbrev_regex, x))[[1]]
## => [1] "XYZ" "WXYZ" "VWXYZ"