relatively new to R, need help with applying a regex-based substitution. I have a data frame in one column of which I have a sequence of digits (my values of interest) followed by a string of all sorts of characters. Example:
4623(randomcharacters)
I need to remove everything after the initial digits to continue working with the values. My idea was to use gsub to remove the non-digit characters by positive lookbehind. The code I have is:
sub_function <- function() {
gsub("?<=[[:digit:]].", " ", fixed = T)
}
data_frame$`x` <- data_known$`x` %>%
sapply(sub_function)
But I then get the error:
Error in FUN(X[[i]], ...) : unused argument (X[[i]])
Any help would be greatly appreciated!
Here is a base R function.
It uses sub
, not gsub
, since there will be only one substitution. And there's no need for look behind, the meta-character ^
marks the beginning of the string, followed by an optional minus sign, followed by at least one digit. Everything else is discarded.
sub_function <- function(x){
sub("(^-*[[:digit:]]+).*", "\\1", x)
}
data <- data.frame(x = c("4623(randomcharacters)", "-4623(randomcharacters)"))
sub_function(data$x)
#[1] "4623" "-4623"
With this simple modification the function returns a numeric vector.
sub_function <- function(x){
y <- sub("(^-*[[:digit:]]+).*", "\\1", x)
as.numeric(y)
}