Search code examples
pythonregexweb-scrapingbeautifulsoup

Python web scraping with regex


Could someone assist me with a bit of code I'd like to build to pull stats from a game? I can put the HTML into BeautifulSoup, but I don't know how to properly format the regex to get at the specific bit of data from the entire page. Here's what I've got:

from urllib import urlopen
from bs4 import BeautifulSoup
import re

content = urlopen('http://www.worldoftanks.com/community/accounts/1000395103-FrankenTank').read()
soup = BeautifulSoup(content)
print soup

If you could show me how to pull one stat out, I can figure out the rest. One of the stats is Battles participated (10103), coded as below:

<tr>
<td class=""> Battles Participated: </td>
<td class="td-number-nowidth"> 10 103 </td>
</tr>

Solution

  • Searching the tree:

    battles = soup.find('td', 'td-number-nowidth')
    if battles:
       print(battles.get_text())