python web-scraping beautifulsoup python-requests spotify

Scraping data from Spotify charts

I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?

for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
    for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
        for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
            for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
                print(d)

And let say this is the songs list that I want to scrape from it. https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14

And also this is the html code of the page.

Solution

In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
from bs4 import BeautifulSoup


chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
browser.get(url)
wait = WebDriverWait(browser, 5)
try:
    wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
    print("accepted cookies")
except Exception as e:
    print('no cookie button')
header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", header_to_be_removed)
while True:
    try:
        show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
        show_more_button.location_once_scrolled_into_view
        t.sleep(5)
        show_more_button.click()
        print('clicked to show more')
        t.sleep(3)
    except TimeoutException:
        print('all done')
        break
songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
print('we have', len(songs), 'songs')
song_list = []
for song in songs:
    song.location_once_scrolled_into_view
    t.sleep(1)
    title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
    artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
    song_list.append((artist.text, title.text))
df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
print(df)

This will print out in terminal:

no cookie button
clicked to show more
clicked to show more
clicked to show more
clicked to show more
all done
we have 50 songs

	Title	Artist
0	Bizarrap,	Quevedo: Bzrp Music Sessions, Vol. 52
1	Harry Styles	As It Was
2	Bad Bunny,	Me Porto Bonito
3	Bad Bunny	Tití Me Preguntó
4	Manuel Turizo	La Bachata
5	ROSALÍA	DESPECHÁ
6	BLACKPINK	Pink Venom
7	David Guetta,	I'm Good (Blue)
8	OneRepublic	I Ain't Worried
9	Bad Bunny	Efecto
10	Chris Brown	Under The Influence
11	Steve Lacy	Bad Habit
12	Bad Bunny,	Ojitos Lindos
13	Kate Bush	Running Up That Hill (A Deal With God) - 2018 Remaster
14	Joji	Glimpse of Us
15	Nicki Minaj	Super Freaky Girl
16	Bad Bunny	Moscow Mule
17	Rosa Linn	SNAP
18	Glass Animals	Heat Waves
19	KAROL G	PROVENZA
20	Charlie Puth,	Left and Right (Feat. Jung Kook of BTS)
21	Harry Styles	Late Night Talking
22	The Kid LAROI,	STAY (with Justin Bieber)
23	Tom Odell	Another Love
24	Central Cee	Doja
25	Stephen Sanchez	Until I Found You
26	Bad Bunny	Neverita
27	Post Malone,	I Like You (A Happier Song) (with Doja Cat)
28	Lizzo	About Damn Time
29	Nicky Youre,	Sunroof
30	Elton John,	Hold Me Closer
31	Luar La L	Caile
32	KAROL G,	GATÚBELA
33	The Weeknd	Die For You
34	Bad Bunny,	Tarot
35	James Hype,	Ferrari
36	Imagine Dragons	Bones
37	Elton John,	Cold Heart - PNAU Remix
38	The Neighbourhood	Sweater Weather
39	Ghost	Mary On A Cross
40	Shakira,	Te Felicito
41	Justin Bieber	Ghost
42	Bad Bunny,	Party
43	Drake,	Jimmy Cooks (feat. 21 Savage)
44	Doja Cat	Vegas (From the Original Motion Picture Soundtrack ELVIS)
45	Camila Cabello,	Bam Bam (feat. Ed Sheeran)
46	Rauw Alejandro,	LOKERA
47	Rels B	cómo dormiste?
48	The Weeknd	Blinding Lights
49	Arctic Monkeys	505

Of course you can get other info like chart ranking, all artists when there are more than one, etc.

Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.

Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

For selenium docs, visit: https://www.selenium.dev/documentation/