Search code examples
pythonregexpython-3.xcapture-groupnamed-captures

Python Regular Expression Named Capture Groups


Im learning regular expressions, specifically named capture groups.

Having an issue where I'm not able to figure out how to write an if/else statement for my function findVul().

Basically how the code works or should work is that findVul() goes through data1 and data2, which has been added to the list myDATA.

If the regex finds a match for the entire named group, then it should print out the results. It currently works perfectly.

CODE:

import re

data1 = '''

dwadawa231d .2 vulnerabilities discovered dasdfadfad .One vulnerability discovered 123e2121d21 .12 vulnerabilities discovered sgwegew342 dawdwadasf

2r3232r32ee

'''

data2 = ''' d21d21 .2 vul discovered adqdwdawd .One vulnerability disc d12d21d .two vulnerabilities discovered 2e1e21d1d f21f21

'''

def findVul(data):
    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
    match = re.finditer(pattern, data)

    for x in match:
        print(x.group())


myDATA = [data1,data2] count_data = 1

for x in myDATA:
    print('\n--->Reading data{0}\n'.format(count_data))
    count_data+=1
    findVul(x)

OUTPUT:

--->Reading data1

2 vulnerabilities discovered
One vulnerability discovered
12 vulnerabilities discovered

--->Reading data2

Now I want to add an if/else statement to check if there are any matches for the entire named group.

I tried something like this, but it doesn't seem to be working.

CODE:

def findVul(data):
    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
    match = re.finditer(pattern, data)

    if len(list(match)) != 0:
        print('\nVulnerabilities Found!\n')
        for x in match:
            print(x.group())

    else:
        print('No Vulnerabilities Found!\n')

OUTPUT:

--->Reading data1


Vulnerabilities Found!


--->Reading data2

No Vulnerabilities Found!

As you can see it does not print the vulnerabilities that should be in data1.

Could someone please explain the correct way to do this and why my logic is wrong. Thanks so much :) !!


Solution

  • I did some more research after @AdamKG response.

    I wanted to utlize the re.findall() function.

    re.findall() will return a list of all matched substrings. In my case I have capture groups inside of my named capture group. This will return a list with tuples.

    For example the following regex with data1:

    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+ 
    (vulnerabilities|vulnerability)\s+discovered)')
    
    match = re.findall(pattern, data)
    

    Will return a list with tuples:

    [('2 vulnerabilities discovered', '2', 'vulnerabilities'), ('One vulnerability 
    discovered', 'One', 'vulnerability'), ('12 vulnerabilities discovered', '12', 
    'vulnerabilities')]
    

    My Final Code for findVul():

    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
    match = re.findall(pattern, data)
    
    if len(match) != 0:
        print('Vulnerabilties Found!\n')
        for x in match:
            print('--> {0}'.format(x[0]))
    else:
        print('No Vulnerability Found!\n')