Search code examples
pythonweb-scrapingbeautifulsouplxmlyahoo-finance

WebScraping with BeautifulSoup or LXML.HTML


I have seen some webcasts and need help in trying to do this: I have been using lxml.html. Yahoo recently changed the web structure.

target page;

http://finance.yahoo.com/quote/IBM/options?date=1469750400&straddle=true

In Chrome using inspector: I see the data in

 //*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table

then some more code

How Do get this data out into a list. I want to change to other stock from "LLY" to "Msft"?
How do I switch between dates....And get all months.


Solution

  • Basing the Answer on @hoju:

    import lxml.html
    import calendar
    from datetime import datetime
    
    exDate  = "2014-11-22"
    symbol  = "LLY"
    dt      = datetime.strptime(exDate, '%Y-%m-%d')
    ym      = calendar.timegm(dt.utctimetuple())
    
    url     = 'http://finance.yahoo.com/q/op?s=%s&date=%s' % (symbol, ym,)
    doc     = lxml.html.parse(url)
    table   = doc.xpath('//table[@class="details-table quote-table Fz-m"]/tbody/tr')
    
    rows    = []        
    for tr in table:
         d = [td.text_content().strip().replace(',','') for td in tr.xpath('./td')]
         rows.append(d)
    
    print rows