Search code examples
pythonseleniumweb-scrapinglist-comprehensionwebdriverwait

How to scrape the names of all the artists from the table using Selenium and Python?


I am trying to scrape a website of the top 1000 artists and append them to a list in order to perform a lyrical analysis by searching the artists' names. The website I am using has the option to display All 1000 artists at once and so I used selenium to select that choice. From there, I find the artist names and have them in a list of WebElements. I iterate through the list in order to get the text element and append it to my list. The program keeps throwing a StaleElementReferenceException after obtaining a certain number of artists as shown below.

enter image description here

I tried a number of suggested options such as using a wait until statement or a try and catch statement but ended up crashing the program. Most solutions I have seen occurred when clicking or interacting with a web element however I am not changing anything on the page after I select my option so I am not sure where I am going wrong. I am fairly new to selenium so I am not sure if this is the best way to obtain the artist names. Any help would be appreciated.

My code:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://chartmasters.org/most-streamed-artists-ever-on-spotify/')

try:
    # get the select tag
    select = Select(driver.find_element(By.TAG_NAME,'#table_1_length > label > div > select'))
    # select by value (select All option to get all 1000 artists)
    select.select_by_value('-1')

    all_artists = []
    all_artists_references = driver.find_elements(By.CLASS_NAME, 'bolded.column-artist-name')

    for element in all_artists_references:
        print(element.text)
        all_artists.append(element.text)

    print(all_artists)

finally:
    driver.quit()

Solution

  • To extract and print all the 1000 artist names you need to induce WebDriverWait for visibility_of_all_elements_located() using List Comprehension you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR:

      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#table_1 tbody tr[role='row'] td:nth-of-type(2)")))])
      
    • Using XPATH:

      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='table_1']//tbody//tr[@role='row']//following::td[2]")))])
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC