I have a data.frame with two variables of string expressions like "ABC`w/XYZ 8", where w = any number from 1 to 999. What I need to do is to substract w and substitute the whole string with it. I use this code:
df <- data.frame(a = c("ABC`5/XYZ 8", "A`25/BHU 19", "ach`246/chy 0"), b = c("sfse`3/cjd 65", "jlke`234/Chu 19", "h`45/hy 0"))
df$a <- sapply(df$a, function(x) {substr(df$a[x], regexpr("`[0-9]+/", df$a[x]) +1,
+ regexpr("`[0-9]+/", df$a[x]) + attr(regexpr("`[0-9]+/", df$a[x]), "match.length")-2)})
It works, but instead of a = c(5, 25, 246) I get a = c(25, 5, 246). I guess this happens because of the factor class of a. However, when a is class character I get NAs as an output. Is there a way to preserve the order of a or use sapply and substr for array of characters?
We can use sub
to extract the numbers specified in the 'w' position of the string. Match the pattern of one or more alphabets along with "``", capture one or more numbers that follows it as a group ((\\d+)
) followed by other characters (.*
) and replace it with the backreference of the capture group.
as.numeric(sub("[A-Za-z`]+(\\d+).*", "\\1", df$a))
#[1] 5 25 246
Or another option is str_extract
library(stringr)
as.numeric(str_extract(df$a, "\\d+"))
#[1] 5 25 246