I am learning analytics with some friends and recently I was presented with a problem to solve which I am struggling a lot. I was provided a large (around 17000 lines) of a VB.net script (as he works with this) where I am supposed to pair a sub with the Hashtags.
A sample of the code is presented below:
Sub NewEPU(bLog)
arrList= Glo_arrMR_E_Base
arrList= Filter(arrList,"E#[None]",FALSE,0)
arrList2= Glo_arrMR_A_Base
'deactivated: incorrect SB21042016'
'arrList2= Filter(arrList2,"A#[Owned]",FALSE,0)
'arrList2= Filter(arrList2,"A#[Outstanding]",FALSE,0)
For each strMR_E in arrList
For each strMR_A in arrList2
HS.NoInput "E#" & strMR_E & ".A#" & strMR_A & ".V#[None]"
Next
Next
End Sub
So basically, my new code should go through this sub (NewEPU) and return that this sub has E#, A#, and V#. A Pseudo-script I thought was:
<read files>
<search for '*#'>
<If found>
<Search 'sub' before & read name>
<Search 'sub' after & read name>
< If not found>
<Do nothing>
I was thinking of dealing with Python, but NLTK is dividing the subs and not helping to create the logic above. Does anyone know how to solve this? Is there perhaps a better tool or a better language to do so?
I found a solution to the problem.
First I check the closing statements lines with an enumerate: End = [i for i, s in enumerate(script) if 'End Sub' in s]
Followed by a search for the lines of the 'Sub' word on a split string, since the sub is followed by the function name as 'End Sub' is not followed by anything: id_Sub = [i for i, s in enumerate(script) if 'Sub' in s.split()]
From there I do a search for # and retrieve the lines. from there is a simple comparison on a DataFrame