I have a flat file with one C++ function name and part of its declaration like this:
virtual void NameSpace1::NameSpace2::ClassName1::function_name1(int arg1) const
void function_name2
void NameSpace2::NameSpace4::ClassName2::function_name3
function_name4
I am trying to extract the function names alone by using this line:
fn_name = re.match(":(.*?)\(?", lines)
I can understand why function_name2
and function_name4
do not match (because there is no leading :
. But I am seeing that even for function_name1
and function_name3
, it does not do non-greedy match. The output of fn_name.group()
is
:NameSpace2::ClassName1::function_name1
I have three questions:
function_name1
" to be extracted from line 1, but the non-greedy match does not seem to work. Why?Please help.
This works pretty well, with your example at least:
^(?:\w+ +)*(?:\w+::)*(\w+)
i.e., in Python code:
import re
function_name = re.compile(r'^(?:\w+ +)*(?:\w+::)*(\w+)', re.MULTILINE)
matches = function_name.findall(your_txt)
# -> ['function_name1', 'function_name2', 'function_name3', 'function_name4']
Takeaway: If you can do it with greedy matching, do it with greedy matching.
Note that \w
is not correct for a C identifier, but writing down the technically correct character class that matches those is besides the question. Find and use the correct set of characters instead of \w
.