Search code examples
pythonweb-scrapingphantomjssplinter

Web scraping dynamic content with Splinter module


I'm struggling with scraping a table (from steamcommunity) that is dynamically loaded through js. I'm using a combination of python Splinter and headless browser Phantomjs.

Here is what I already came up with:

from splinter import Browser
import time
import sys

browser = Browser('phantomjs')

url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'   

browser.visit(url)
print browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5)
price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')

print price_table
print price_table.first
print price_table.first.text
print price_table.first.value
browser.quit()

The first method is_element_present_by_xpath() ensures that the table I'm interested in is loaded. Then I try to access the rows of that table.

As I understood from Splinter documentation the .find_by_xpath() method returns ElementList, which is essentially a normal list with some aliases provided.

Price_table is an ElementList of all rows of table. The last two prints give out empty results, and I can't find any reason why text-method returns an empty string.

How could the elements of that table be accessed?


Solution

  • I tried code with different browsers and always got empty text but I found expected data in html. Maybe it is only mistake in splinter.

    from splinter import Browser
    
    #browser = Browser('firefox')
    #browser = Browser('phantomjs')
    
    #browser = Browser('chrome') # executable_path='/usr/bin/chromium-browser' ??? error !!!
    browser = Browser('chrome') # executable_path='/usr/bin/chromedriver' OK
    
    url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'   
    
    browser.visit(url)
    
    print(browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5))
    
    price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')
    
    for row in price_table:
        print('row html:', row.html)
        print('row text:', row.text) # empty ???
        for col in row.find_by_tag('td'):
            print('  col html:', col.html)
            print('  col text:', col.text) # empty ???
    
    browser.quit()