I am trying to scrape the country information from the website below,
https://www.morningstar.com/etfs/xnas/vnqi/portfolio
which entails clicking the 'Country'
selection in the Exposure
section, then moving through the 1, 2,3, etc. pages using the arrows at the bottom of the section. Nothing I have tried seems to work. Is there a way to do it using selenium in Python?
Many thanks!
Here is the code I used:
urlpage = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
driver.get(urlpage)
elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
for elem in elements:
elem.click()
and this is the error message:
TimeoutException
Traceback (most recent call last)
<ipython-input-3-bf16ea3f65c0> in <module>
23 driver = webdriver.Chrome(options=options, executable_path='D:\Python\Python38\chromedriver.exe')
24 driver.get(urlpage)
---> 25 elements=WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//a[text()='Country']")))
26 for elem in elements:
27 elem.click()
D:\Anaconda\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
78 if time.time() > end_time:
79 break
---> 80 raise TimeoutException(message, screen, stacktrace)
81
82 def until_not(self, method, message=''):
TimeoutException: Message:
Sorry, not sure how to format the error message better. Thanks again.
It seems you didn't check what you really have in HTML
. So you didn't do the most important thing.
There is NO <a>
with text Country
on this page.
There is <input>
with value="Country"
This code works for me
import time
from selenium import webdriver
url = 'https://www.morningstar.com/etfs/xnas/vnqi/portfolio'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)
country = driver.find_element_by_xpath('//input[@value="Country"]')
country.click()
time.sleep(1)
next_page = driver.find_element_by_xpath('//a[@aria-label="Go to Next Page"]')
while True:
# get data
table_rows = driver.find_elements_by_xpath('//table[@class="sal-country-exposure__country-table"]//tr')
for row in table_rows[1:]: # skip header
elements = row.find_elements_by_xpath('.//span') # relative xpath with `.//`
print(elements[0].text, elements[1].text, elements[2].text)
# check if there is next page
disabled = next_page.get_attribute('aria-disabled')
#print('disabled:', disabled)
if disabled:
break
# go to next page
next_page.click()
time.sleep(1)
Result
Japan 22.08 13.47
China 10.76 1.45
Australia 9.75 6.05
Hong Kong 9.52 6.04
Germany 8.84 5.77
Singapore 6.46 4.33
United Kingdom 6.22 5.77
Sweden 3.48 2.00
France 3.18 2.58
Canada 2.28 2.92
Switzerland 1.78 0.69
Belgium 1.63 1.31
Philippines 1.53 0.15
Israel 1.47 0.16
Thailand 0.98 0.09
India 0.87 0.11
South Africa 0.87 0.21
Taiwan 0.83 0.08
Mexico 0.80 0.33
Spain 0.62 0.84
Malaysia 0.54 0.08
Brazil 0.52 0.06
Austria 0.51 0.16
New Zealand 0.41 0.21
Indonesia 0.37 0.02
Norway 0.37 0.29
United States 0.29 44.09
Netherlands 0.24 0.19
Chile 0.21 0.01
Ireland 0.16 0.19
South Korea 0.15 0.00
Turkey 0.08 0.02
Russia 0.08 0.00
Finland 0.06 0.16
Poland 0.05 0.00
Greece 0.05 0.00
Italy 0.02 0.05
Argentina 0.00 0.00
Colombia 0.00 0.00
Czech Republic 0.00 0.00
Denmark 0.00 0.00
Estonia 0.00 0.00
Hungary 0.00 0.00
Latvia 0.00 0.00
Lithuania 0.00 0.00
Pakistan 0.00 0.00
Peru 0.00 0.00
Portugal 0.00 0.00
Slovakia 0.00 0.00
Venezuela 0.00 0.00