I was new to web scraping and I was trying to create a scraper which looks at a playlist link and gets the list of the music and the author.
But the site kept rejecting my connection because it thought that I was a bot, so I used UserAgent to create a fake useragent string to try and bypass the filter.
It sort of worked? But the problem was that when you visited the website by a browser, you could see the contents of the playlist, but when you tried to extract the html code with requests, the contents of the playlist was just a big blank space.
Mabye I have to wait for the page to load? Or there is a stronger bot filter?
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
melon_site="http://kko.to/IU8zwNmjM"
headers = {'User-Agent' : ua.random}
result = requests.get(melon_site, headers = headers)
print(result.status_code)
src = result.content
soup = BeautifulSoup(src,'html.parser')
print(soup)
You wanna check out this link to get the content you wish to grab.
The following attempt should fetch you the artist names and their song names.
import requests
from bs4 import BeautifulSoup
url = 'https://www.melon.com/mymusic/playlist/mymusicplaylistview_listSong.htm?plylstSeq=473505374'
r = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select("tr:has(#artistName)"):
artist_name = item.select_one("#artistName > a[href*='goArtistDetail']")['title']
song = item.select_one("a[href*='playSong']")['title']
print(artist_name,song)
Output are like:
Martin Garrix - 페이지 이동 Used To Love (feat. Dean Lewis) 재생 - 새 창
Post Malone - 페이지 이동 Circles 재생 - 새 창
Marshmello - 페이지 이동 Here With Me 재생 - 새 창
Coldplay - 페이지 이동 Cry Cry Cry 재생 - 새 창
Note: your BeautifulSoup version should be 4.7.0
or later in order for the script to support pseudo selector.