Search code examples
htmlpython-3.xweb-scrapingbeautifulsoupsoundcloud

Get specific informaion from html code


The idea is to collect all soundcloud users' id's (not names) who posted tracks that first letter is e.g. "f" in the period in our case of "past year".

I used filters on soundcloud and got results in the next URL: https://soundcloud.com/search/sounds?q=f&filter.created_at=last_year&filter.genre_or_tag=hip-hop%20%26%20rap

I found the first user's id ("wavey-hefner") in the follow line of html code: <a class="sound__coverArt" href="/wavey-hefner/foreign" draggable="true">

I want to get every user's id from the whole html.

My code is:

import requests
import re
from bs4 import BeautifulSoup
html = requests.get("https://soundcloud.com/search/sounds?q=f& filter.created_at=last_year&filter.genre_or_tag=hip-hop%20%26%20rap")
soup = BeautifulSoup(html.text, 'html.parser')
for id in soup.findAll("a", {"class" : "sound_coverArt"}):
    print (id.get('href'))

It returns nothing :(


Solution

  • The page is rendered in JavaScript. You can use Selenium to render it, first install Selenium:

    pip3 install selenium
    

    Then get a driver e.g. https://sites.google.com/a/chromium.org/chromedriver/downloads (if you are on Windows or Mac you can get a headless version of Chrome - Canary if you like) put the driver in your path.

    from bs4 import BeautifulSoup
    from selenium import webdriver
    import time
    
    browser = webdriver.Chrome()
    url = ('https://soundcloud.com/search/sounds?q=f& filter.created_at=last_year&filter.genre_or_tag=hip-hop%20%26%20rap')
    browser.get(url)
    time.sleep(5)
    # To make it load more scroll to the bottom of the page (repeat if you want to)
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)
    html_source = browser.page_source
    browser.quit()
    
    soup =   BeautifulSoup(html_source, 'html.parser')
    for id in soup.findAll("a", {"class" : "sound__coverArt"}):
        print (id.get('href'))
    

    Outputs:

    /tee-grizzley/from-the-d-to-the-a-feat-lil-yachty
    /empire/fat-joe-remy-ma-all-the-way-up-ft-french-montana
    /tee-grizzley/first-day-out
    /21savage/feel-it
    /pluggedsoundz/famous-dex-geek-1
    /rodshootinbirds/fairytale-x-rod-da-god
    /chancetherapper/finish-line-drown-feat-t-pain-kirk-franklin-eryn-allen-kane-noname
    /alkermith/future-low-life-ft-the-weeknd-evol
    /javon-woodbridge/fabolous-slim-thick
    /hamburgerhelper/feed-the-streets-prod-dequexatron-1000
    /rob-neal-139819089/french-montana-lockjaw-remix-ft-gucci-mane-kodak-black
    /pluggedsoundz/famous-dex-energy
    /ovosoundradiohits/future-ft-drake-used-to-this
    /pluggedsoundz/famous
    /a-boogie-wit-da-hoodie/fucking-kissing-feat-chris-brown
    /wavey-hefner/foreign
    /jalensantoy/foreplay
    /yvng_swag/fall-in-luv
    /rich-the-kid/intro-prod-by-lab-cook
    /empire/fat-joe-remy-ma-money-showers-feat-ty-dolla-ign