Search code examples
pythonregexbackslashcurly-brackets

Working with the characters backslash and curly brackets in regular expressions in Python


I am using regular expressions to recognize the lines containing \begin{frame} in .tex files. Below is my code:

#!/usr/bin/python

import re,sys

def isEven(num):
    res = [False,True][bool(num % 2 == 0)]
    return res

textin = open(sys.argv[1]).readlines()
nline = 0
pat = r'\b\begin{frame}\b'
for line in textin:
    line = line.strip(' ')
    #print 'Test: ',line[:13]
    if re.match(pat,line):
        print 'here'
        nline += 1
    if isEven(nline):
        print '%',line.strip('\n')
    else:
        print line.strip('\n')

This program aims to add the character '%' before the lines in the tex file if the number of frames is even. In other words, I want to comment the slides which the slide number is even.

Do you know what is the wrong in the pattern?


Solution

  • Look at your pattern string again:

    r'\b\begin{frame}\b'
    

    Notice it starts with '\b\b'. You mean the first one as a word boundary, the second one as part of what you want to match -- but how could re possibly guess what you mean each for?!

    I don't think you need the word-boundaries, by the way -- in fact they may mess up the matching. Moreover, re.match only matches at the start; since you say "contain", as opposed to "start with", in your Q's text, you may actually want re.search.

    To match a backslash, you need to double it in the pattern. And you can use a single backslash to escape punctuation, such as those braces.

    So I would recommend...:

    def isEven(n): return n%2 == 0
    
    nline = 0
    pat = r'\\begin\{frame\}'
    with open(sys.argv[1]) as textin:
        for line in textin:
            line = line.strip()
            if re.search(pat,line):
                print 'here'
                nline += 1
            if isEven(nline):
                print '%', line
            else:
                print line
    

    I've done a few more improvements but they're not directly relevant to your Q (e.g, use with to open the file, and loop on it line by line; strip each line of whitespace completely, once, rather than by instalments; etc -- but you don't have to use any of these:-).