Search code examples
pythonweb-scrapingbeautifulsouppython-requestsspotify

Scraping data from Spotify charts


I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?

for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
    for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
        for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
            for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
                print(d) 

And let say this is the songs list that I want to scrape from it. https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14

And also this is the html code of the page. enter image description here


Solution

  • In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.common.exceptions import NoSuchElementException, TimeoutException
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    import time as t
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("window-size=1920,1080")
    
    webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
    browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
    
    url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
    browser.get(url)
    wait = WebDriverWait(browser, 5)
    try:
        wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
        print("accepted cookies")
    except Exception as e:
        print('no cookie button')
    header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
    browser.execute_script("""
    var element = arguments[0];
    element.parentNode.removeChild(element);
    """, header_to_be_removed)
    while True:
        try:
            show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
            show_more_button.location_once_scrolled_into_view
            t.sleep(5)
            show_more_button.click()
            print('clicked to show more')
            t.sleep(3)
        except TimeoutException:
            print('all done')
            break
    songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
    print('we have', len(songs), 'songs')
    song_list = []
    for song in songs:
        song.location_once_scrolled_into_view
        t.sleep(1)
        title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
        artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
        song_list.append((artist.text, title.text))
    df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
    print(df)
    

    This will print out in terminal:

    no cookie button
    clicked to show more
    clicked to show more
    clicked to show more
    clicked to show more
    all done
    we have 50 songs
    
    Title Artist
    0 Bizarrap, Quevedo: Bzrp Music Sessions, Vol. 52
    1 Harry Styles As It Was
    2 Bad Bunny, Me Porto Bonito
    3 Bad Bunny Tití Me Preguntó
    4 Manuel Turizo La Bachata
    5 ROSALÍA DESPECHÁ
    6 BLACKPINK Pink Venom
    7 David Guetta, I'm Good (Blue)
    8 OneRepublic I Ain't Worried
    9 Bad Bunny Efecto
    10 Chris Brown Under The Influence
    11 Steve Lacy Bad Habit
    12 Bad Bunny, Ojitos Lindos
    13 Kate Bush Running Up That Hill (A Deal With God) - 2018 Remaster
    14 Joji Glimpse of Us
    15 Nicki Minaj Super Freaky Girl
    16 Bad Bunny Moscow Mule
    17 Rosa Linn SNAP
    18 Glass Animals Heat Waves
    19 KAROL G PROVENZA
    20 Charlie Puth, Left and Right (Feat. Jung Kook of BTS)
    21 Harry Styles Late Night Talking
    22 The Kid LAROI, STAY (with Justin Bieber)
    23 Tom Odell Another Love
    24 Central Cee Doja
    25 Stephen Sanchez Until I Found You
    26 Bad Bunny Neverita
    27 Post Malone, I Like You (A Happier Song) (with Doja Cat)
    28 Lizzo About Damn Time
    29 Nicky Youre, Sunroof
    30 Elton John, Hold Me Closer
    31 Luar La L Caile
    32 KAROL G, GATÚBELA
    33 The Weeknd Die For You
    34 Bad Bunny, Tarot
    35 James Hype, Ferrari
    36 Imagine Dragons Bones
    37 Elton John, Cold Heart - PNAU Remix
    38 The Neighbourhood Sweater Weather
    39 Ghost Mary On A Cross
    40 Shakira, Te Felicito
    41 Justin Bieber Ghost
    42 Bad Bunny, Party
    43 Drake, Jimmy Cooks (feat. 21 Savage)
    44 Doja Cat Vegas (From the Original Motion Picture Soundtrack ELVIS)
    45 Camila Cabello, Bam Bam (feat. Ed Sheeran)
    46 Rauw Alejandro, LOKERA
    47 Rels B cómo dormiste?
    48 The Weeknd Blinding Lights
    49 Arctic Monkeys 505

    ​ Of course you can get other info like chart ranking, all artists when there are more than one, etc.

    Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.

    Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

    For selenium docs, visit: https://www.selenium.dev/documentation/