Search code examples
pythonregexstringfile-ioquotes

Why does my regex not work on input from file.read()?


I have a section of code that I need to remove from multiple files that starts like this:

<?php
//{{56541616

and ends like this:

//}}18420732
?>

where both strings of numbers can be any sequence of letters and numbers (not the same).

I wrote a Python program that will return the entire input string except for this problem string:

def removeInsert(text):
    m = re.search(r"<\?php\n\/\/\{\{[a-zA-Z0-9]{8}.*\/\/\}\}[a-zA-Z0-9]{8}\n\?>", text, re.DOTALL)
    return text[:m.start()] + text[m.end():]

This program works great when I call it with removeInsert("""[file text]""") -- the triple quotes allow it to be read in as multiline.

I attempted to extend this to open a file and pass the string contents of the file to removeInsert() with this:

def fileRW(filename):
    input_file = open(filename, 'r')
    text = input_file.read()
    newText = removeInsert(text)
    ...

However, when I run fileRW([input-file]), I get this error:

return text[:m.start()] + text[m.end():]
AttributeError: 'NoneType' object has no attribute 'start'

I can confirm that "text" in that last code is actually a string, and does contain the problem code, but it seems that the removeInsert() code doesn't work on this string. My best guess is that it's related to the triple quoting I do when inputting the string manually into removeInsert(). Perhaps the text that fileRW() passes to removeInsert() is not triple-quoted (I've tried different ways of forcing it to have triple quotes ("\"\"\"" added), but that doesn't work). I have no idea how to fix this, though, and can't find any information about it in my google searching. Any suggestions?


Solution

  • Your regex only uses \n for lines. Your text editor may insert a carriage return and newline combination: \r\n. Try changing \n in your regex to (\r\n|\r|\n).