I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?
for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
print(d)
And let say this is the songs list that I want to scrape from it. https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14
In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
browser.get(url)
wait = WebDriverWait(browser, 5)
try:
wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
print("accepted cookies")
except Exception as e:
print('no cookie button')
header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", header_to_be_removed)
while True:
try:
show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
show_more_button.location_once_scrolled_into_view
t.sleep(5)
show_more_button.click()
print('clicked to show more')
t.sleep(3)
except TimeoutException:
print('all done')
break
songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
print('we have', len(songs), 'songs')
song_list = []
for song in songs:
song.location_once_scrolled_into_view
t.sleep(1)
title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
song_list.append((artist.text, title.text))
df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
print(df)
This will print out in terminal:
no cookie button
clicked to show more
clicked to show more
clicked to show more
clicked to show more
all done
we have 50 songs
Title | Artist | |
---|---|---|
0 | Bizarrap, | Quevedo: Bzrp Music Sessions, Vol. 52 |
1 | Harry Styles | As It Was |
2 | Bad Bunny, | Me Porto Bonito |
3 | Bad Bunny | Tití Me Preguntó |
4 | Manuel Turizo | La Bachata |
5 | ROSALÍA | DESPECHÁ |
6 | BLACKPINK | Pink Venom |
7 | David Guetta, | I'm Good (Blue) |
8 | OneRepublic | I Ain't Worried |
9 | Bad Bunny | Efecto |
10 | Chris Brown | Under The Influence |
11 | Steve Lacy | Bad Habit |
12 | Bad Bunny, | Ojitos Lindos |
13 | Kate Bush | Running Up That Hill (A Deal With God) - 2018 Remaster |
14 | Joji | Glimpse of Us |
15 | Nicki Minaj | Super Freaky Girl |
16 | Bad Bunny | Moscow Mule |
17 | Rosa Linn | SNAP |
18 | Glass Animals | Heat Waves |
19 | KAROL G | PROVENZA |
20 | Charlie Puth, | Left and Right (Feat. Jung Kook of BTS) |
21 | Harry Styles | Late Night Talking |
22 | The Kid LAROI, | STAY (with Justin Bieber) |
23 | Tom Odell | Another Love |
24 | Central Cee | Doja |
25 | Stephen Sanchez | Until I Found You |
26 | Bad Bunny | Neverita |
27 | Post Malone, | I Like You (A Happier Song) (with Doja Cat) |
28 | Lizzo | About Damn Time |
29 | Nicky Youre, | Sunroof |
30 | Elton John, | Hold Me Closer |
31 | Luar La L | Caile |
32 | KAROL G, | GATÚBELA |
33 | The Weeknd | Die For You |
34 | Bad Bunny, | Tarot |
35 | James Hype, | Ferrari |
36 | Imagine Dragons | Bones |
37 | Elton John, | Cold Heart - PNAU Remix |
38 | The Neighbourhood | Sweater Weather |
39 | Ghost | Mary On A Cross |
40 | Shakira, | Te Felicito |
41 | Justin Bieber | Ghost |
42 | Bad Bunny, | Party |
43 | Drake, | Jimmy Cooks (feat. 21 Savage) |
44 | Doja Cat | Vegas (From the Original Motion Picture Soundtrack ELVIS) |
45 | Camila Cabello, | Bam Bam (feat. Ed Sheeran) |
46 | Rauw Alejandro, | LOKERA |
47 | Rels B | cómo dormiste? |
48 | The Weeknd | Blinding Lights |
49 | Arctic Monkeys | 505 |
Of course you can get other info like chart ranking, all artists when there are more than one, etc.
Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.
Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html
For selenium docs, visit: https://www.selenium.dev/documentation/