Search code examples
pythonregexpython-2.7multilinenon-greedy

Strip multiline $Log keyword expansion with python re


I have a large number of files with $Log expanded-keyword text at the end that needs to be deleted. I am looking to modify an existing python 2.7 script to do this but cannot get the regex working correctly.

The text to strip from the end of a file looks like this:

/*
one or more lines of ..
.. possible text
$Log: oldfile.c,v $
Revision 11.4  2000/01/20 19:01:41  userid
a bunch more text ..
.. of unknown number of lines
*/

I want to strip all of the text shown above, including the comment anchors /* and */ and everything in between.

I looked at these questions/answers and a few others:

Python re.sub non-greedy mode ..

Python non-greedy rebexes

The closest I have been able to get is with:

content = re.sub(re.compile(r'\$Log:.*', re.DOTALL), '', content)

Which of course leaves behind the opening /*.

The following deleted my whole sample test file because the file opens with a matching comment (I thought the non-greedy ? modifier would prevent this):

content = re.sub(re.compile(r'^/\*.*?\$Log:.*', re.DOTALL), '', content)

I experimented with using re.MULTILINE without success.

How can a regex be defined in Python to grab the whole $Log comment -- AND none of the previous comments in the file?


Solution

  • You can use:

    result = re.sub(r"/\*\s+\*+\s+\$Log.*?\*/", "", subject, 0, re.DOTALL)
    

    enter image description here


    Regex Demo

    Python Demo