Search code examples
pythonweb-scrapingbeautifulsoup

Scrape address using BeautifulSoup for Python


I am having difficulties scraping the address from the following weblink, please help me scrape the address.

http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby

the source code for the weblink above is as follow

<td width="100%"><div class="titleBM">Bankstown Masjid </div>Meredith Street, Bankstown, New South Wales 2200</td>

I am trying to scrape the value immediatly after </div>

my current code is not completed but looks like follow

content1 = urllib2.urlopen(url1).read()
soup1 = BeautifulSoup(content1)
div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
span1 = div1.find('</div>')
pos1 = span1.text       

print datetime.datetime.now(), 'street address:  ' , pos1)

Solution

  • The text is the next sibling of the <div> element, so use next_sibling:

    from bs4 import BeautifulSoup
    import urllib2
    import datetime
    
    url1 = 'http://www.salatomatic.com/d/Revesby+17154+Ahlus-Sunnah-Wal-Jamaah-Revesby'
    
    content1 = urllib2.urlopen(url1).read()
    soup1 = BeautifulSoup(content1)
    div1 = soup1.find('div', {'class':'titleBM'}) #get the div where it's located
    pos1 = div1.next_sibling
    
    print datetime.datetime.now(), 'street address:  ' , pos1
    

    Run it like:

    python2 script.py
    

    It yields:

    2013-12-03 12:55:41.306271 street address:   9-11 Mavis Street, Revesby, New South Wales 2212