Search code examples
regexexcelvbaregex-groupregex-greedy

VBA Strange RegEx Issue


I am using Visual Basic in Excel to change some data

The pattern:

"^(" & Chr(167) & "\s)(\d+\:\d+)\.(\w+.*$)"

In some cases, the text is:

§ 34:149.Lorem ipsum dolor sit amet

and I want to separate the text into

34:149 in the current cell and then "Lorem ipsum dolor sit amet" in the next column

The problem I'm having is when the given values look like this:

§ 53:178.40

The RegEx returns this: 53:178 as one group and then .40 as the other

I can't get consistency on the returned subexpressions group because with nnn:nnn.nn, .nn is returned as an additional group as opposed to a member of the the already delineated group.

I'm confused with the operation here. I'm missing something, because I would expect any entries of the form NN:NN.NN to be rejected as matches.

After 40 minutes of trial and error, I can't figure the glitch.

The RegEx code snippet is:

strPattern = "^(" & Chr(167) & "\s)(\d+\:\d+)\.(\w+.*$)"
replaceString = "$2"
replaceString = "$2$3"
replaceString2 = "$4"

RE.Global = True
RE.MultiLine = True
RE.IgnoreCase = False
RE.pattern = strPattern

result1 = RE.Replace(source, replaceString)
result2 = RE.Replace(source, replaceString2)

Solution

  • \w matches digits, too. Use [a-zA-Z] instead to match any ASCII letter (you do not even need + after it):

    "^(" & Chr(167) & "\s)(\d+:\d+)\.([a-zA-Z].*$)"
    

    See the regex demo

    Also, : is not a special regex metacharacter, it needs no escaping.

    In case you want to make sure the char after . is not a digit use [^0-9] or \D after \.:

    ^(§\s)(\d+:\d+)\.(\D.*)
    

    See this regex demo. If it cannot be a whitespace either use

    ^(§\s)(\d+:\d+)\.([^\d\s].*)
    

    See another demo.