I'm struggling with scraping a table (from steamcommunity) that is dynamically loaded through js. I'm using a combination of python Splinter and headless browser Phantomjs.
Here is what I already came up with:
from splinter import Browser
import time
import sys
browser = Browser('phantomjs')
url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'
browser.visit(url)
print browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5)
price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')
print price_table
print price_table.first
print price_table.first.text
print price_table.first.value
browser.quit()
The first method is_element_present_by_xpath()
ensures that the table I'm interested in is loaded. Then I try to access the rows of that table.
As I understood from Splinter documentation the .find_by_xpath()
method returns ElementList
, which is essentially a normal list with some aliases provided.
Price_table
is an ElementList
of all rows of table. The last two prints give out empty results, and I can't find any reason why text-method returns an empty string.
How could the elements of that table be accessed?
I tried code with different browsers and always got empty text
but I found expected data in html
. Maybe it is only mistake in splinter
.
from splinter import Browser
#browser = Browser('firefox')
#browser = Browser('phantomjs')
#browser = Browser('chrome') # executable_path='/usr/bin/chromium-browser' ??? error !!!
browser = Browser('chrome') # executable_path='/usr/bin/chromedriver' OK
url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'
browser.visit(url)
print(browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5))
price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')
for row in price_table:
print('row html:', row.html)
print('row text:', row.text) # empty ???
for col in row.find_by_tag('td'):
print(' col html:', col.html)
print(' col text:', col.text) # empty ???
browser.quit()