I have a R data frame that contains a column with the following type of strings:
DBR 0 1/2 02/15/25
FRTR 3 04/25/22
BTPS 1.35 04/15/22
I would like to use a regular expression to match the portion of the string between the first space and the last space.
The output would therefore be:
0 1/2
3
1.35
Some background information:
These are bond descriptions. The first segment of the first line is a country key (DBR = Germany). The last part refers to the maturity date (for the first bond February 15th 2025).
In between the country key and the maturity date, several different conventions are used for the coupon of the bond. For example the German bond has a 0.5% coupon, the second (French) bond has a 3% coupon and the final (Italian) one has a 1.35% coupon.
I already figured out how to match the country key and maturity date via
^[^\\s]+ (for the country key)
[^\\s]+$ (for the maturity date)
Then I want to process the coupons to a uniform format after the match for further calculations.
0 1/2 > 0,5
3 > 3.0
1.35 > 1.35
The mixed formats of the coupons is also why I would like to extract only between the first and last space. the first bond for example has an additional space in the coupon.
Thank you.
Here is a complete walk-through in base R
:
df <- data.frame(junk = c("DBR 0 1/2 02/15/25", "FRTR 3 04/25/22", "BTPS 1.35 04/15/22"), stringsAsFactors = FALSE)
df$coupon <- sapply(df$junk, function (item) {
frac <- sub(".*?([\\d./]+)$", "\\1", sub(".*?\\s+(.*)\\s.*", "\\1", item, perl=TRUE), perl = TRUE)
eval(parse(text = frac))
})
df
junk coupon
1 DBR 0 1/2 02/15/25 0.50
2 FRTR 3 04/25/22 3.00
3 BTPS 1.35 04/15/22 1.35
The idea is to apply two regular expressions and eval()
ing the result.
dplyr
and some error handling:
library(dplyr)
df <- data_frame(junk = c("DBR 0 1/2 02/15/25",
"FRTR 3 04/25/22",
"BTPS 1.35 04/15/22",
"someweirdojunk"))
make_coupon <- function(col) {
result <- sapply(col, function (item) {
tryCatch({
frac <- sub(".*?([\\d./]+)$", "\\1", sub(".*?\\s+(.*)\\s.*", "\\1", item))
eval(parse(text = frac))
}, error = function(e) {
NA
})
})
return(result)
}
df %>%
mutate(coupon = make_coupon(junk))
This generates:
# A tibble: 4 x 2
junk coupon
<chr> <dbl>
1 DBR 0 1/2 02/15/25 0.500
2 FRTR 3 04/25/22 3.00
3 BTPS 1.35 04/15/22 1.35
4 someweirdojunk NA