Search code examples
pythonregexpython-2.7non-greedy

Python Regex - non-greedy match does not work


I have a flat file with one C++ function name and part of its declaration like this:

virtual void NameSpace1::NameSpace2::ClassName1::function_name1(int arg1) const
void function_name2
void NameSpace2::NameSpace4::ClassName2::function_name3
function_name4

I am trying to extract the function names alone by using this line:

fn_name = re.match(":(.*?)\(?", lines)

I can understand why function_name2 and function_name4 do not match (because there is no leading :. But I am seeing that even for function_name1 and function_name3, it does not do non-greedy match. The output of fn_name.group() is

:NameSpace2::ClassName1::function_name1

I have three questions:

  1. I expected just the string "function_name1" to be extracted from line 1, but the non-greedy match does not seem to work. Why?
  2. Why is line 3 not being extracted?
  3. How do I get the function names from all the lines using a single regex?

Please help.


Solution

  • This works pretty well, with your example at least:

    ^(?:\w+ +)*(?:\w+::)*(\w+)
    

    i.e., in Python code:

    import re
    
    function_name = re.compile(r'^(?:\w+ +)*(?:\w+::)*(\w+)', re.MULTILINE)
    matches = function_name.findall(your_txt)
    
    # -> ['function_name1', 'function_name2', 'function_name3', 'function_name4']
    

    Takeaway: If you can do it with greedy matching, do it with greedy matching.


    Note that \w is not correct for a C identifier, but writing down the technically correct character class that matches those is besides the question. Find and use the correct set of characters instead of \w.