I am trying to take a string input in my python code to be converted and implemented into a URL to search the string on the website. The website I am using is songbpm.com and what I want is to search a song and I receive the speed of the song. Finding the relevant information within the HTML is not the problem, I have already finished this and my url creation is working, which is here:
import urllib.request
import urllib.parse
song = input("")
fin = ""
for i in song:
if i == "(":
tempone = song
song = tempone.split("(")[0] + tempone.split(") ")[1]
previous = ""
for i in song:
if i.isalpha():
temp = fin
fin = temp + i
else:
if previous.isalpha():
temp = fin
fin = temp + "-"
previous = i
songencoded = urllib.parse.quote(song, safe='')
print('https://songbpm.com/'+ fin.lower() + '?q=' + songencoded)
response = urllib.request.urlopen('https://songbpm.com/'+ fin.lower() + '?q=' + songencoded)
text = str(response.read()).split('\\n')
The urls, which are returned are identical to the url when I manually enter the search input on the website, however, when I run this code, it always reads the html data for the no results redirect.
Also, if I paste the computer-generated URL into the browser, it redirects to the no results page, however, after searching the same string by hand in the browser, the computer-generated url works as well (when retrying).
What I have also observed is that after manually opening a certain URL, I can run the code with the same search query and it works - it seems as if searches are cached for a certain amount of time if a user, not a code opens it.
How do I tackle this issue of the code, although generating the exact URL, not being able to open webpages similar to the user.
The site has a few extra requirements to make a suitable request. Firstly it uses cookies, so a cookiejar
is needed. This can be loaded by first requesting the homepage without making a search. This also then gives you the value for _csrf
which is needed when submitting the request form. Lastly, the POST request can be generated from your input search by using urlencode()
to build q
correctly:
from operator import itemgetter
from bs4 import BeautifulSoup
import http.cookiejar
import urllib.request
import urllib.parse
song = input('Enter song: ')
cookie_jar = http.cookiejar.CookieJar()
cookie_processor = urllib.request.HTTPCookieProcessor(cookie_jar)
opener = urllib.request.build_opener(cookie_processor)
with opener.open('https://songbpm.com') as response:
html_1 = response.read().decode('utf-8')
soup_1 = BeautifulSoup(html_1, 'html.parser')
data = urllib.parse.urlencode({'q' : song, '_csrf' : soup_1.input['value']}).encode('ascii')
with opener.open('https://songbpm.com/searches', data) as response:
html_2 = response.read().decode('utf-8')
soup_2 = BeautifulSoup(html_2, 'html.parser')
for a in soup_2.find_all('a', {'class' : 'media'}):
print(', '.join(itemgetter(0, 1, 4)([p.get_text(strip=True) for p in a.find_all('p')])))
Which would give you the following results:
Enter song: clean bandit - solo
Clean Bandit, Solo (feat. Demi Lovato), 105
Clean Bandit, Solo (feat. Demi Lovato) - Acoustic, 0
Clean Bandit, Solo (feat. Demi Lovato) - Ofenbach Remix, 121
Clean Bandit, Solo (feat. Demi Lovato) - Sofi Tukker Remix, 127
Clean Bandit, Solo (feat. Demi Lovato) - Wideboys Remix, 122
Using beautifulsoup
makes it easy to extract all the details. itemgetter()
is just a quick way to get certain items from a given list.