Search code examples
pythonweb-scrapingbeautifulsoupurllib2mechanize

Open a page programmatically in python


Can you extract the VIN number from this webpage?

I tried urllib2.build_opener, requests, and mechanize. I provided user-agent as well, but none of them could see the VIN.

opener = urllib2.build_opener()
opener.addheaders = [('User-agent',('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) ' 'AppleWebKit/535.1 (KHTML, like Gecko) ' 'Chrome/13.0.782.13 Safari/535.1'))]
page = opener.open(link)
soup = BeautifulSoup(page)

table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
vin = table.contents[0]
print vin

Solution

  • You can use browser automation tools for the purpose.

    For example this simple selenium script can do your work.

    from selenium import webdriver
    from bs4 import BeautifulSoup
    
    link = "https://www.iaai.com/Vehicles/VehicleDetails.aspx?auctionID=14712591&itemID=15775059&RowNumber=0"
    browser = webdriver.Firefox()
    browser.get(link)
    page = browser.page_source
    
    soup = BeautifulSoup(page)
    
    table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
    vin = table.contents.span.contents[0]
    print vin
    

    BTW, table.contents[0] prints the entire span, including the span tags.

    table.contents.span.contents[0] prints only the VIN no.