python web-scraping beautifulsoup urllib2 mechanize

Open a page programmatically in python

Can you extract the VIN number from this webpage?

I tried urllib2.build_opener, requests, and mechanize. I provided user-agent as well, but none of them could see the VIN.

opener = urllib2.build_opener()
opener.addheaders = [('User-agent',('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) ' 'AppleWebKit/535.1 (KHTML, like Gecko) ' 'Chrome/13.0.782.13 Safari/535.1'))]
page = opener.open(link)
soup = BeautifulSoup(page)

table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
vin = table.contents[0]
print vin

Solution

You can use browser automation tools for the purpose.

For example this simple selenium script can do your work.

from selenium import webdriver
from bs4 import BeautifulSoup

link = "https://www.iaai.com/Vehicles/VehicleDetails.aspx?auctionID=14712591&itemID=15775059&RowNumber=0"
browser = webdriver.Firefox()
browser.get(link)
page = browser.page_source

soup = BeautifulSoup(page)

table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
vin = table.contents.span.contents[0]
print vin

BTW, table.contents[0] prints the entire span, including the span tags.

table.contents.span.contents[0] prints only the VIN no.