For the following string vector s
, I hope to remove leading zeros in each elements, which is reverse of the answer from this link:
s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
The expected result will like:
s <- c('week 1st', 'weeks 2nd', 'year2022week1st', 'week 4th')
I test the following code, it's not working out since the regex syntax is not complete:
s <- 'week 01st'
sub('^0+(?=[1-9])', '', s, perl=TRUE)
sub('^0+([1-9])', '\\1', s)
Out:
[1] "week 01st"
How could I do that using R?
Update: for the following code contributed by @dvantwisk, it works for year2022week01st
, but not suitable to other elements:
s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub('(year[0-9]{4,})(week)(0{0,})([1-9]{1})([0-9a-zA-Z]{1,})', '\\1\\2\\4\\5', s)
Out:
[1] "week 01st" "weeks 02nd" "year2022week1st" "week 4th"
You might use:
weeks?\h*\K0+(?=[1-9]\d*[a-zA-Z])
The pattern matches:
weeks?
Match week with optional s\h*\K
Match optional spaces and forget what is matched so far0+
Match 1+ times a zero(?=[1-9]\d*[a-zA-Z])
Positive lookahead, assert a char 1-9, optional digit and a char a-zA-Z to the rightSee a Regex demo and a R demo.
In the replacement use an empty string.
For example
s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("weeks?\\h*\\K0+(?=[1-9]\\d*[a-zA-Z])", '', s, perl=T)
Output
[1] "week 1st" "weeks 2nd" "year2022week1st" "week 4th"
Or with 2 capture groups:
(weeks?\h*)0+([1-9]\d*[a-zA-Z])
Example:
s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("(weeks?\\h*)0+([1-9]\\d*[a-zA-Z])", '\\1\\2', s,)
Output
[1] "week 01st" "weeks 02nd" "year2022week1st" "week 4th"