I am trying to scrape a website of the top 1000 artists and append them to a list in order to perform a lyrical analysis by searching the artists' names. The website I am using has the option to display All 1000 artists at once and so I used selenium to select that choice. From there, I find the artist names and have them in a list of WebElements. I iterate through the list in order to get the text element and append it to my list. The program keeps throwing a StaleElementReferenceException after obtaining a certain number of artists as shown below.
I tried a number of suggested options such as using a wait until statement or a try and catch statement but ended up crashing the program. Most solutions I have seen occurred when clicking or interacting with a web element however I am not changing anything on the page after I select my option so I am not sure where I am going wrong. I am fairly new to selenium so I am not sure if this is the best way to obtain the artist names. Any help would be appreciated.
My code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://chartmasters.org/most-streamed-artists-ever-on-spotify/')
try:
# get the select tag
select = Select(driver.find_element(By.TAG_NAME,'#table_1_length > label > div > select'))
# select by value (select All option to get all 1000 artists)
select.select_by_value('-1')
all_artists = []
all_artists_references = driver.find_elements(By.CLASS_NAME, 'bolded.column-artist-name')
for element in all_artists_references:
print(element.text)
all_artists.append(element.text)
print(all_artists)
finally:
driver.quit()
To extract and print all the 1000 artist names you need to induce WebDriverWait for visibility_of_all_elements_located() using List Comprehension you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#table_1 tbody tr[role='row'] td:nth-of-type(2)")))])
Using XPATH:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='table_1']//tbody//tr[@role='row']//following::td[2]")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC