My example data:
l1
[1] "xmms-1.2.11-x86_64-5" "xmms-1.2.11-x86_64-6"
[3] "xmodmap-1.0.10-x86_64-1" "xmodmap-1.0.9-x86_64-1"
[5] "xmodmap3-1.0.10-x86_64-1" "xmodmap3-1.0.9-x86_64-1"
I am using R and would like a regular expression that will capture just the characters before the first dash. Such as
xmms
xmms
xmodmap
xmodmap
xmodmap3
xmodmap3
Since I am using R, the regex needs to be Perl compliant.
I thought I could do this with using a lookbehind on the dash, but I just get a match for the whole string. This is the pattern I tried:
grepl("(?<=[a-z0-9])-",l1, perl=T)
, but it just matches the whole string. I think if I had the first dash as a capture group, I could maybe use the lookbehind, but I don't know how to build the regex with the lookbehind and the capture group.
I looked around at some other questions for possible answers and it seems maybe I need an non-greedy symbol? I tried grepl("(?<=[a-z0-9])-/.+?(?=-)/",l1, perl=T)
, but that didn't work either.
I'm open to other suggestions on how to capture the first set of characters before the dash. I'm currently in base R, but I'm fine with using any packages, like stringr.
1) Base R An option is sub
from base R
to match the -
followed by characters (.*
) and then replace with blank (""
)
sub("-.*", "", l1)
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or capture as a group
sub("(\\w+).*", "\\1", l1)
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or with regmatches/regexpr
regmatches(l1, regexpr('\\w+', l1))
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
or using trimws
trimws(l1, "right", whitespace = "-.*")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Or using read.table
read.table(text = l1, sep="-", header = FALSE, stringsAsFactors = FALSE)$V1
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
or with strsplit
sapply(strsplit(l1, "-"), `[`, 1)
2) stringr Or with word
from stringr
library(stringr)
word(l1, 1, sep="-")
Or with str_remove
str_remove(l1, "-.*")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
3) stringi Or with stri_extract_first
from stringi
library(stringi)
stri_extract_first(l1, regex = "\\w+")
#[1] "xmms" "xmms" "xmodmap" "xmodmap" "xmodmap3" "xmodmap3"
Note: grep/grepl
is for detecting a pattern in the string. For replacing/extracting substring, use sub/regexpr/regmatches
in base R
l1 <- c("xmms-1.2.11-x86_64-5", "xmms-1.2.11-x86_64-6", "xmodmap-1.0.10-x86_64-1",
"xmodmap-1.0.9-x86_64-1", "xmodmap3-1.0.10-x86_64-1", "xmodmap3-1.0.9-x86_64-1"
)