Search code examples
pythonregexregex-group

Regular expression to return all match occurrences


I have text like below-

02052020 02:40:02.445: Vacation Allowance: 21; nnnnnn Vacation Allowance: 22;nnn

I want to extract the below in Python-

Vacation Allowance: 21
Vacation Allowance: 22

Basically, I want to extract all occurrences of "Vacation Allowance:" and the numerical value following this suffixed with ;

I'm using the below regular expression-

(.*)(Vacation Allowance:)(.*);(.*)

Full Python code below-

import re

text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'

pattern = re.compile(r'(.*)(Vacation Allowance:)(.*);(.*)')

for (a,b,c,d) in re.findall(pattern, text):
    print(b, " ", c)

This does not all give all occurrences, but gives only the last occurrence. The current output is-

Vacation Allowance: 22

Can you please comment on how I can extract all occurrences?


Solution

  • The issue is with the regular expression used. The (.*) blocks are accepting more of the string than you realize - .* is referred to as a greedy operation and it will consume as much of the string as it can while still matching. This is why you only see one output.

    Suggest matching something like Vacation Allowance:\s*\d+; or similar.

    text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'
    m = re.findall('Vacation Allowance:\s*(\d*);', text, re.M)
    print(m)
    

    result: ['21', '22']