I am trying to scrape zk.fm in order to download music, but it's giving me some trouble. I'm using urllib3 to generate a response, but this always yields a Bad Gateway error. Accessing the website through a browser works perfectly fine.
This is my code (with a random fake user-agent). I'm trying to access "http://zk.fm/mp3/search?keywords=" followed by some keywords which indicate the song name and artist, for example "http://zk.fm/mp3/search?keywords=childish+gambino+heartbeat".
from bs4 import BeautifulSoup
from random import choice
import urllib3
desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']
def random_headers():
return {'User-Agent': choice(desktop_agents)}
ua = random_headers()
http = urllib3.PoolManager(10,headers=user_agent)
response = http.request('GET',"http://zk.fm/mp3/search?
keywords=childish+gambino+heartbeat")
soup = BeautifulSoup(response.data)
Is there a way to work around the 502 Error, or is it out of my control?
You need to enable the persistence of cookies, then access, in order, the site home page followed by the search URL. I suggest (personally) python-requests, but it is up to you. See here for discussion.
I tested this by visiting the search page - error 502. visit home page - 200. visit search - 200. clear cookies and visit search again - 502. So it must be cookies that are the problem.