I want to be able to identify comments by a regular expression using the re
module in the standard library. The problem is that my line and multiline comments have the same start.
One-line comment:
#= this is a coment
some code here
#= this is a
multiline comment =#
and I've been trying to get one (or more) regular expressions to be able to capture both of them. I've got r'(#=)[\w ]*'
for the single line comment, but I've been unsuccessful for the multiline comment.
Can you help me with this?
It has already been pointed out in the comments that this syntax is not ideal. You can however parse your comments using negative lookahead:
import re
s = """uniline comment:
#= this is a coment
some code here
#= this is a
multiline comment =#
#=single comment at the end"""
pattern = re.compile(r'#=(?:(?!#=).)*?=#|#=.*?(?=\n|$)', re.DOTALL)
result = re.findall(pattern, s)
print(result)
#=(?:(?!#=).)*?=#
captures everything between #=
and the next =#
(multiline comment). We exclude #=
to avoid capturing single line comments in our multiline match.#=.*?(?=\n|$)
captures single line comments ($
ensures single line comments are captured even at the end of the file)See demo