Search code examples
pythonweb-scrapingspotifysplinter

Scrape spotify web interface


I'm trying to get the number of plays for the top songs from a number of artists on Spotify using python and splinter.

If you fill in the username and password below with yours, you should be able to run the code.

from splinter import Browser
import time
from bs4 import BeautifulSoup

browser = Browser()
url = 'http://play.spotify.com'
browser.visit(url)
time.sleep(2)
button = browser.find_by_id('has-account')
button.click()
time.sleep(1)
browser.fill('username', 'your_username')
browser.fill('password', 'your_password')
buttons = browser.find_by_css('button')
visible_buttons = [button for button in buttons if button.visible]
login_button = visible_buttons[-1]
login_button.click()
time.sleep(1)
browser.visit('https://play.spotify.com/artist/5YGY8feqx7naU7z4HrwZM6')
time.sleep(10)

So far, so good. If you open up firefox, you'll can see Miley Cyrus's artist page, including the number of plays for top tracks.

If you open up the Firefox Developer Tools Inspector and hover, you can see the name of the song in .tl-highlight elements, and the number of plays in .tl-listen-count elements. However, I've found it impossible (at least on my machine) to access these elements using splinter. Moreover, when I try to get the source for the entire page, the elements that I can see by hovering my mouse over them in Firefox don't show up in what is ostensibly the page source.

html = browser.html
soup = BeautifulSoup(html)
output = soup.prettify()
with open('miley_cyrus_artist_page.html', 'w') as output_f:
    output_f.write(output)
browser.quit()

I don't think I know enough about web programming to know what the issue is here--Firefox sees all the DOM elements clearly, but splinter that is driving Firefox does not.


Solution

  • Many thanks to @alecxe, the following code works to pull the information on the artist.

    from splinter import Browser
    import time
    from bs4 import BeautifulSoup
    import codecs
    
    browser = Browser()
    url = 'http://play.spotify.com'
    browser.visit(url)
    time.sleep(2)
    button = browser.find_by_id('has-account')
    button.click()
    time.sleep(1)
    browser.fill('username', 'your_username')
    browser.fill('password', 'your_password')
    buttons = browser.find_by_css('button')
    visible_buttons = [button for button in buttons if button.visible]
    login_button = visible_buttons[-1]
    login_button.click()
    time.sleep(1)
    browser.visit('https://play.spotify.com/artist/5YGY8feqx7naU7z4HrwZM6')
    time.sleep(30)
    
    CORRECT_FRAME_INDEX = 6
    with browser.get_iframe(CORRECT_FRAME_INDEX) as iframe:
        html = iframe.html
        soup = BeautifulSoup(html)
        output = soup.prettify()
        with codecs.open('test.html', 'w', 'utf-8') as output_f:
            output_f.write(output)
    browser.quit()