Search code examples
pythonregexpython-re

Regular expression for comments in Python re


I want to be able to identify comments by a regular expression using the re module in the standard library. The problem is that my line and multiline comments have the same start.

One-line comment:
#= this is a coment

some code here

#= this is a 
multiline comment =#

and I've been trying to get one (or more) regular expressions to be able to capture both of them. I've got r'(#=)[\w ]*' for the single line comment, but I've been unsuccessful for the multiline comment.

Can you help me with this?


Solution

  • It has already been pointed out in the comments that this syntax is not ideal. You can however parse your comments using negative lookahead:

    import re
    s = """uniline comment:
    #= this is a coment
    
    some code here
    
    #= this is a 
    multiline comment =#
    
    #=single comment at the end"""
    
    pattern = re.compile(r'#=(?:(?!#=).)*?=#|#=.*?(?=\n|$)', re.DOTALL)
    result = re.findall(pattern, s)
    print(result)
    
    • #=(?:(?!#=).)*?=#captures everything between #= and the next =# (multiline comment). We exclude #= to avoid capturing single line comments in our multiline match.
    • #=.*?(?=\n|$) captures single line comments ($ ensures single line comments are captured even at the end of the file)

    See demo