Performing analytics on a script file

I am learning analytics with some friends and recently I was presented with a problem to solve which I am struggling a lot. I was provided a large (around 17000 lines) of a VB.net script (as he works with this) where I am supposed to pair a sub with the Hashtags.

A sample of the code is presented below:

Sub NewEPU(bLog)

arrList= Glo_arrMR_E_Base
arrList= Filter(arrList,"E#[None]",FALSE,0)
arrList2= Glo_arrMR_A_Base
'deactivated: incorrect  SB21042016'
'arrList2= Filter(arrList2,"A#[Owned]",FALSE,0)
'arrList2= Filter(arrList2,"A#[Outstanding]",FALSE,0)

For each strMR_E in arrList
    For each strMR_A in arrList2

            HS.NoInput "E#" & strMR_E & ".A#" & strMR_A & ".V#[None]"
    Next
Next
End Sub

So basically, my new code should go through this sub (NewEPU) and return that this sub has E#, A#, and V#. A Pseudo-script I thought was:

  <read files> 
  <search for '*#'>
     <If found>
          <Search 'sub' before & read name>
          <Search 'sub' after & read name> 
     < If not found> 
          <Do nothing>

I was thinking of dealing with Python, but NLTK is dividing the subs and not helping to create the logic above. Does anyone know how to solve this? Is there perhaps a better tool or a better language to do so?

Solution

I found a solution to the problem.

First I check the closing statements lines with an enumerate: End = [i for i, s in enumerate(script) if 'End Sub' in s]

Followed by a search for the lines of the 'Sub' word on a split string, since the sub is followed by the function name as 'End Sub' is not followed by anything: id_Sub = [i for i, s in enumerate(script) if 'Sub' in s.split()]

From there I do a search for # and retrieve the lines. from there is a simple comparison on a DataFrame