Search code examples
regexstringvb.netreplaceregex-group

Regular Expression: Find a specific group within other groups in VB.Net


I need to write a regular expression that has to replace everything except for a single group.

E.g

IN OUT
OK THT PHP This is it 06222021 This is it
NO MTM PYT Get this content 111111 Get this content

I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))

This RegEx creates 4 groups, using the first entry as an example the groups are:

  1. OK THT PHP
  2. This is it
  3. 06222021
  4. Space Charachter

I need a way to:

  • Replace Group 1,2,4 with String.Empty

OR

  • Get Group 3, ONLY

Solution

  • You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.

    Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.

    ^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
    
    • ^ Start of string
    • \w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
    • (.*?) Capture group 1 match any char as least as possible
    • \s\d{6,8} Match a whitespace char and 6-8 digits
    • \s? Match an optional whitespace char
    • $ End of string

    Regex demo

    Example code

    Dim s As String = "OK THT PHP This is it 06222021"
    Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
    Console.WriteLine(result)
    

    Output

    This is it