Search code examples
regexvbscriptnon-greedy

Match Methods with References to Global Variables


This question is heavily related to this one, but it has to do with grabbing methods that contain references to global variables (not commented out).

I'm using the following regular expression and test string to check to see if it works, but it's only partially working:

Regular Expression

^((?:(?:Public|Private)\s+)?(?:Function|Sub).+)[\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$

(I need part of the regular expression this way with the capturing group so that I can grab the name of the method as a sub-match).

Test String

'-----------------------------------------------------------------------------------------
'
'   the code:   Header
'
'-----------------------------------------------------------------------------------------

Dim GLOBAL_VARIABLE_1
Dim GLOBAL_VARIABLE_2
Dim GLOBAL_VARIABLE_3

Public Function doThis(byVal xml)
'' Created               : dd/mm/yyyy
'' Return                : string
'' Param            : xml- an xml blob

     return = replace(xml, "><", ">" & vbLf & "<")

     GLOBAL_VARIABLE_1 = 2 + 2

     doThis = return

End Function


msgbox GLOBAL_VARIABLE_1



Public Function doThat(byVal xPath)
'' Created               : dd/mm/yyyy
'' Return                : array
' 'Param            : xPath

     return = split(mid(xPath, 2), "/")

     GLOBAL_VARIABLE_2 = 2 + 2


     doThat = return

End Function


GLOBAL_VARIABLE_2 = 2 + 2


Public Sub butDontDoThis()
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj

     For i = 0 To 5
          return = return & "bye" & " "

     Next

End Sub


GLOBAL_VARIABLE_3 = 3 + 3


Public Sub alsoDoThis(byRef obj)
'' Created               : dd/mm/yyyy
'' Return                : string
' 'Param            : obj, an xml document object

     For i = 0 To 4
          return = return & "hi" & " "

     Next

     GLOBAL_VARIABLE_1 = 1 + 1

End Sub


GLOBAL_VARIABLE_3 = 3 + 3

Using http://www.regexpal.com/, I'm able to highlight the first method that references a global variable. However, the regular expression is not doing what I expect it to do with the other methods. The regular expression is also picking up other methods that don't have references to a specific global variable, and it ends with the last method that is actually using the global variable. I've determined the problem to be that the [\s\S]+?(GLOBAL_VARIABLE_1)[\s\S]+?End\s+(?:Function|Sub)$ part is doing a minimal / non-greedy match so that it keeps looking until it finds an actual match.

In summary the expression should follow these rules:

  • to stop scanning the method it is currently checking when it sees the first end of a method's declaration. In this example, only the doThis and alsoDoThis methods should be matched for GLOBAL_VARIABLE_1, but I'm not sure what the regular expression should be.
  • The regular expression should also only match methods that are actually using global variables
  • If a GLOBAL_VARIABLE_1 is commented out, then it is really not being used by the method. A commented GLOBAL_VARIABLE_1 should not trigger a positive match for the method.

Solution

  • Description

    I'd do this in two steps, first identify each of your functions and subs. Here I'm using a reference \1 to ensure we're matching the correct end function or end sub. This regex also grabs the function name and places that into group 2. This can then be used later if part 2 is correct

    (?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1 enter image description here

    Then test each of these to see if they contain your variable, note in this test I'm using multiline matching to ensure the comment character does not appear before Global_Variable on the same line. This also checks that the GLOBAL_VARIABLE_1 is not preceded by any of the following

    • alphanumeric with or without a _ seperater. This would need to be updated with all the characters you might find in a variable name. Including a hyphen - here might be confused with a minus sign used in an equation.
    • comment character '

    ^[^']*?(?![a-z0-9][_]?|['])\bGLOBAL_VARIABLE_1

    enter image description here

    VB Part 1

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("(?:Public|Private)\s+(Function|Sub)\s+([a-z0-9]*).*?End\s+\1",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module
    
    $matches Array:
    (
        [0] => Array
            (
                [0] => Public Function doThis(byVal xml)
    '' Created               : dd/mm/yyyy
    '' Return                : string
    '' Param            : xml- an xml blob
    
         return = replace(xml, "><", ">" & vbLf & "<")
    
         GLOBAL_VARIABLE_1 = 2 + 2
    
         doThis = return
    
    End Function
                [1] => Public Function doThat(byVal xPath)
    '' Created               : dd/mm/yyyy
    '' Return                : array
    ' 'Param            : xPath
    
         return = split(mid(xPath, 2), "/")
    
         GLOBAL_VARIABLE_2 = 2 + 2
    
    
         doThat = return
    
    End Function
                [2] => Public Sub butDontDoThis()
    '' Created               : dd/mm/yyyy
    '' Return                : string
    ' 'Param            : obj
    
         For i = 0 To 5
              return = return & "bye" & " "
    
         Next
    
    End Sub
                [3] => Public Sub alsoDoThis(byRef obj)
    '' Created               : dd/mm/yyyy
    '' Return                : string
    ' 'Param            : obj, an xml document object
    
         For i = 0 To 4
              return = return & "hi" & " "
    
         Next
    
         GLOBAL_VARIABLE_1 = 1 + 1
    
    End Sub
            )
    
        [1] => Array
            (
                [0] => Function
                [1] => Function
                [2] => Sub
                [3] => Sub
            )
    
        [2] => Array
            (
                [0] => doThis
                [1] => doThat
                [2] => butDontDoThis
                [3] => alsoDoThis
            )
    
    )
    

    VB Part 2

    Found in this text

    Public Function doThis(byVal xml)
    '' Created               : dd/mm/yyyy
    '' Return                : string
    '' Param            : xml- an xml blob
    
         return = replace(xml, "><", ">" & vbLf & "<")
    
         GLOBAL_VARIABLE_1 = 2 + 2
    
         doThis = return
    
    End Function
    

    example

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module
    
    $matches Array:
    (
        [0] => Array
            (
                [0] =>  Param            : xml- an xml blob
    
         return = replace(xml, "><", ">" & vbLf & "<")
    
         GLOBAL_VARIABLE_1
            )
    
    )
    

    not found in this text

    Public Function doThis(byVal xml)
    '' Created               : dd/mm/yyyy
    '' Return                : string
    '' Param            : xml- an xml blob
    
         return = replace(xml, "><", ">" & vbLf & "<")
    
      '   GLOBAL_VARIABLE_1 = 2 + 2
    
         doThis = return
    
    End Function
    

    example

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module
    
    Matches Found:
    NO MATCHES.
    

    also not found in this text

    Public Sub butDontDoThis()
    '' Created               : dd/mm/yyyy
    '' Return                : string
    ' 'Param            : obj
    
         For i = 0 To 5
              return = return & "bye" & " "
    
         Next
    
    End Sub
    

    example

       Imports System.Text.RegularExpressions
        Module Module1
          Sub Main()
            Dim sourcestring as String = "Public Sub butDontDoThis()
        '' Created               : dd/mm/yyyy
         '' Return                : string
         ' 'Param            : obj
    
         For i = 0 To 5
              return = return & ""bye"" & "" ""
    
         Next
    
    End Sub"
            Dim re As Regex = New Regex("^[^']*?GLOBAL_VARIABLE_1",RegexOptions.IgnoreCase OR RegexOptions.Multiline)
            Dim mc as MatchCollection = re.Matches(sourcestring)
            Dim mIdx as Integer = 0
            For each m as Match in mc
              For groupIdx As Integer = 0 To m.Groups.Count - 1
                Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
              Next
              mIdx=mIdx+1
            Next
          End Sub
        End Module
    
        Matches Found:
        NO MATCHES.
    

    Disclaimer

    There are a lot of edge cases which can trip this up, for example if you have a comment with ' end function or have a if you assign a string value to a variable like thisstring = "end sub"

    Yes I realize OP was for VBscript, I've included these examples to demonstrate the overall logic and that the regular expressions work.