Search code examples
pythonregexpattern-matchingnon-greedy

Most non-greedy regex match in python (or just simply regex in general


Im having an issue where my regex is matching too much. I've tried making it as non-greedy as possible. My RE is:

 define host( |\t)*{(.*\n)*?( |\t)*host_name( |\t)*HOST_B(.*\n)*?( |\t)*}

meaning

"define host" followed by any spaces or tabs followed by "{". Any text and newlines until any number of spaces or tabs followed by "host_name" followed by any number of spaces or tabs followed by "HOST_B". Any text plus newlines until any spaces or tabs followed by "}"

My text is

define host{
    field stuff
        }

define timeperiod{
        sunday          00:00-03:00,07:00-24:00
        }

define stuff{
        hostgroup_name                  things
        service_description             load
        dependent_service_description   cpu_util
        execution_failure_criteria      n
        notification_failure_criteria   w,u,c
        }

define host{
        use                     things
        host_name               HOST_A
        0alias                  stuff 
       }

define host{
        use                     things
        host_name               HOST_B
        alias                   ughj
        address                 1.6.7.6
       }

define host{
        use                     things
        host_name               HOST_C
       }

The match is going from the first define to host_b's end bracket. It is not getting host_c's group (it should not get host_c), however I would like only host b's group and not the whole thing.

Any help? My regex is rusty. You can test on http://regexpal.com/


Solution

  • I have not tested it, but I guess you need to remove .* with [^{]*. This way your regex does not eat the next "{".

    This looks strange to me: (.*\n)*? Have a look at DOTALL: If you set this flag the dot eats newlines.