I tried to extract a date from the following text. Unfortunately, it keeps giving me warning and the result is NA
I have a following text:
"IRA-401K Investment Assets Under Management (AUM) As of July 31, 2018 BMG Funds
$217,743,573 BMG BullionBars $45,176,561 TOTAL $262,920,134 Physical Holdings Download
Scotiabank BMG BullionBars List Download Brinks BMG BullionBars List Holdings by Ounces As
of July 31, 2018 Gold Bars 21,132.496 Silver Bars 453,531.574 Silver Coins
80,500 Platinum Bars"
The text contains following date: July 31, 2018. These dates appear twice in the text.
I used following code to extract the dates out of the text.
test_take <- lapply(cleanurl_text, parse_date_time, orders = "mdy",
locale = Sys.setlocale('LC_TIME', locale = "English_Canada.1252"))
I get the following error message:
Warning message: All formats failed to parse. No formats found.
When I include exact = TRUE
test_take <- lapply(as.character(cleanurl_text), parse_date_time, orders = "mdy",
locale = Sys.setlocale('LC_TIME', locale = "English_Canada.1252"), exact = TRUE)
I get the following warning:
Warning message: 1 failed to parse.
The resulting object still contains NA
.
The following regex can extract the date in the posted format.
pattern <- paste(month.name, collapse = "|")
pattern <- paste0("(", pattern, ")\\s\\d{1,2}.{1,2}\\d{4}")
m <- gregexpr(pattern, cleanurl_text)
regmatches(cleanurl_text, m)
#[[1]]
#[1] "July 31, 2018" "July 31, 2018"
Note that this can be done in just one code line, regmatches(gregexpr(.))
, but I have opted for two lines in order to make it more readable.