I am trying to fill the form on https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html with multiple values and get results. Please note that the URL does not change on submit. (Validate button)
I have tried to fill the form with Mechanize and extract the result with Beautifulsoup. But I am not able to get my head around receiving a response as the URL never changes.
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup as bsoup
import mechanize
#Fill form with mechanize
br = mechanize.Browser()
br.open("https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html")
response = br.response()
mech=response.read()
br.select_form(id='myform')
br.form['alb']='7'
br.form['hemo']='17'
br.form['alkph']='5000'
br.form['psa']='5000'
br.submit()
#Extract Output
url = urllib.request.urlopen("https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html")
content = url.read()
soup= bsoup(content,"html.parser")
riskValue=soup.find('div',{'id':'resultPanelRisk3'})
tableValue=riskValue.find('table')
trValue=tableValue.find_all('tr')[1]
LowValue=trValue.find('td',{'id':'Risk3Low'}).string
IntermediateValue=trValue.find('td',{'id':'Risk3Intermediate'}).string
HighValue=trValue.find('td',{'id':'Risk3High'}).string
With the above code the value for LowValue is '*', whereas the expected LowValue for above mentioned form values is 'Yes'.
It would be easier and more efficient doing this using the requests library, so your code should look something like this:
import requests
alb='7'
hemo='17'
alkph='5000'
psa='5000'
url = f"https://www.cancer.duke.edu/Nomogram/EquationServer?pred=1&risk=1&lnm=0&bm=0&visc=0&pain=0&ldh=0&psanew=0&alb={alb}&hemo={hemo}&alkph={alkph}&psa={psa}&equationName=90401&patientid=&comment=&_=1556956911136"
req = requests.get(url).text
results = req[req.index("Row6=")+5:].strip().split(",")
results_transform = ['Yes' if x == '1' else 'No' for x in results]
LowValue = results_transform[2]
IntermediateValue= results_transform[3]
HighValue= results_transform[4]
PS:
the results
variable outputs something like this:
['NA', 'NA', '1', 'NA', 'NA']
where the last three elements are Risk3Low
, Risk3Intermediate
and Risk3High
respectively. Furtheremore "NA" = "No"
and "1" = "Yes"
.
Which is why I'm using results_transform
in order to transform
['NA', 'NA', '1', 'NA', 'NA']
into:
['No', 'No', 'Yes', 'No', 'No']
I hope this helps