Search code examples
regexvbalookbehind

VBA: REGEX LOOKBEHIND MS ACCESS 2010


I have a function that was written so that VBA can be used in MS Access I wish to do the following

I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine

So it looks for the letters MIP and one of the 3 letter codes (any of them)

 StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"

' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1.  Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ


Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
    ' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
     strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
    ' SELECT EVERYTHING AFTER THE THREE LETTER CODES
     strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop

The function i am using which i have taken from the internet is below

Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String

On Error GoTo RefErr:

Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive

Dim m
For Each m In re.Execute(StringToCheck)
    regexp = UCase(m.Value)
Next

RefErr:
    On Error Resume Next

End Function

Solution

  • Just do it in two steps:

    1. Check if MIP is in the string
    2. If it is, remove the other codes.

    Like this:

    Sub Test()
      Dim StringToCheck As String
      StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
    
      Debug.Print StringToCheck
      Debug.Print CleanupString(StringToCheck)
    End Sub
    
    Function CleanupString(str As String) As String
      Dim reCheck As New RegExp
      Dim reCodes As New RegExp
    
      reCheck.Pattern = "^(?:...)*?MIP"
      reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
      reCodes.Global = True
    
      If reCheck.Test(str) Then
        While reCodes.Test(str)
          str = reCodes.Replace(str, "$1")
        Wend
      End If
    
      CleanupString = str
    End Function
    

    Note that the purpose of (?:...)*? is to group the letters in threes.


    Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:

    Function CleanupString(str As String) As String
      Dim reClean As New RegExp
    
      reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
    
      While reClean.Test(str)
        str = reClean.Replace(str, "$1")
      Wend
    
      CleanupString = str
    End Function
    

    Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.