I have a character vector with each element denoting the time of data collection. Unfortunately, the elements do not follow the same pattern:
"05.1990 - 06.1990, Poland"
"11.05.1990 - 13.07.1990, Portugal"
"1993 - 1993, Romania"
Is there a neat way, using regular expressions, to extract:
If possible, I'd like to have two different regular expressions for (1) and (2).
You can do this using positive lookaheads. Here's an example using {stringr}
x <- c(
"05.1990 - 06.1990, Poland",
"11.05.1990 - 13.07.1990, Portugal",
"1993 - 1993, Romania"
)
# The year when the data collection started (the first four digits before the dash)
stringr::str_extract(x, "\\d{4}(?=\\s*-)")
#> [1] "1990" "1990" "1993"
# The year when the data collection ended (the first four digits before the comma)
stringr::str_extract(x, "\\d{4}(?=,)")
#> [1] "1990" "1990" "1993"
Created on 2022-10-14 with reprex v2.0.2