Search code examples
pythonwebscreen-scraping

how to re.compile(multiple Regexpression)


I use python2.7.11 to do this work, for example, I have two expressions,

pattern_movie_name=re.compile(r'<span class="title">(.*?)</span>')
pattern_movie_Englishname=re.compile(r'<span class="title">&nbsp;/&nbsp;(.*?)</span>')

if I would like to add them to one expression, how I can do?

pattern_movie_all=re.compile(r'<span class="title">(.*?)</span>'+r'<span class="title">&nbsp;/&nbsp;(.*?)</span>')

It doesn't work for this!


Solution

  • Use alternation operator. Note that the order of patterns is very important.

    re.compile(r'<span class="title">&nbsp;/&nbsp;(.*?)</span>|<span class="title">(.*?)</span>')
    

    If the span tag contains a newline character then use DOTALL modifier.

    re.compile(r'(?s)<span class="title">&nbsp;/&nbsp;(.*?)</span>|<span class="title">(.*?)</span>')