Search code examples
pythonformsbeautifulsoupmechanize

Merge Beautifulsoup and Mechanize to fill a form and retrieve results from the same URL


I am trying to fill the form on https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html with multiple values and get results. Please note that the URL does not change on submit. (Validate button)

I have tried to fill the form with Mechanize and extract the result with Beautifulsoup. But I am not able to get my head around receiving a response as the URL never changes.

import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup as bsoup
import mechanize

#Fill form with mechanize
br = mechanize.Browser()
br.open("https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html")
response = br.response()
mech=response.read()
br.select_form(id='myform')
br.form['alb']='7'
br.form['hemo']='17'
br.form['alkph']='5000'
br.form['psa']='5000'
br.submit()

#Extract Output
url = urllib.request.urlopen("https://www.cancer.duke.edu/Nomogram/firstlinechemotherapy.html")
content = url.read()
soup= bsoup(content,"html.parser")
riskValue=soup.find('div',{'id':'resultPanelRisk3'})
tableValue=riskValue.find('table')
trValue=tableValue.find_all('tr')[1]
LowValue=trValue.find('td',{'id':'Risk3Low'}).string
IntermediateValue=trValue.find('td',{'id':'Risk3Intermediate'}).string
HighValue=trValue.find('td',{'id':'Risk3High'}).string

With the above code the value for LowValue is '*', whereas the expected LowValue for above mentioned form values is 'Yes'.


Solution

  • It would be easier and more efficient doing this using the requests library, so your code should look something like this:

    import requests
    
    alb='7'
    hemo='17'
    alkph='5000'
    psa='5000'
    
    url = f"https://www.cancer.duke.edu/Nomogram/EquationServer?pred=1&risk=1&lnm=0&bm=0&visc=0&pain=0&ldh=0&psanew=0&alb={alb}&hemo={hemo}&alkph={alkph}&psa={psa}&equationName=90401&patientid=&comment=&_=1556956911136"
    req = requests.get(url).text
    
    results = req[req.index("Row6=")+5:].strip().split(",")
    results_transform = ['Yes' if x == '1' else 'No' for x in results]
    
    LowValue = results_transform[2] 
    IntermediateValue= results_transform[3] 
    HighValue= results_transform[4] 
    

    PS:

    the results variable outputs something like this:

    ['NA', 'NA', '1', 'NA', 'NA']
    

    where the last three elements are Risk3Low, Risk3Intermediate and Risk3High respectively. Furtheremore "NA" = "No" and "1" = "Yes".

    Which is why I'm using results_transform in order to transform

    ['NA', 'NA', '1', 'NA', 'NA']
    

    into:

    ['No', 'No', 'Yes', 'No', 'No']
    

    I hope this helps