python ajax selenium mechanize mechanize-python

Can mechanize support ajax / filling out forms via javascript?

I'm trying to create a program that will fill out a form on this site: Insurance survey

I'm using python 2.7 and mechanize after numerous attempts with 3.4 and realizing mechanize doesn't work with 3.4. I'm a novice but have learned a LOT in trying to do this (python is awesome).

import mechanize
br = mechanize.Browser() 
urlofmypage = 'https://interactive.web.insurance.ca.gov/survey/'
br.open(urlofmypage) 
print br.geturl()
br.select_form(nr=0)

br['location'] = ['ALAMEDA BERKELEY']   #SET FORM ENTRIES
br['coverageType'] = ['HOMEOWNERS']
br['coverageAmount'] = ['$150,000']
br['homeAge'] = ['1-3 Years']

result = br.submit()
print result

This is my error : mechanize._form.ItemNotFoundError: insufficient items with name '$150,000'

The problem is, only after I fill out the form fields location and coverageType then do the options for coverageAmount show up :( . I've been messing around with this and watching numerous videos online and all my research has led me to conclude that mechanize won't do this.

I've also read that this is an ajax call, and mechanize won't work for this. Things seem to be pointing towards selenium webdriver... Does anybody have any input?

Solution

AJAX calls are performed by javascript, and mechanize has no way to run javascript. Mechanize only looks at form fields on a static HTML page and allows you to fill & submit those. This is why your research is pointing you towards things like Selenium or Ghost, which run on top of a real browser that can execute javascript.

There is a simpler way to do this though! If you use the developer tools on your browser (e.g. the Network tab in Firefox or Chrome) and fill out the form you can see the request your browser is making behind the scenes, even with AJAX:

This tells you:

The browser made a POST request
To this URL: https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS
With the following form params:
- location=ALAMEDA+ALAMEDA
- coverageType=HOMEOWNERS
- coverageAmount=150000
- homeAge=New

You can use this information to make the same POST request in Python:

import urllib.parse, urllib.request

url = "https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS"
data = urllib.parse.urlencode(dict(
    location="ALAMEDA ALAMEDA",
    coverageType="HOMEOWNERS",
    coverageAmount="150000",
    homeAge="New",
))
res = urllib.request.urlopen(URL, data.encode("utf8"))

print(res.read())

This is python3. The requests library provides an even nicer API for making HTTP requests.

Edit: In response to your three questions:

is it possible for the dictionary that you've created to have more than 1 location and cycle through them using a for loop?

Yes, just add a loop around the code and pass a different value for location each time. I would put this code into a function to make the code cleaner, like this:

https://gist.github.com/lost-theory/08786e3a27c8d8ce3839

the results are in a lot of jibberish, so I'd have to find a way to sift through it huh. Like pick out which is which

Yes, the jibberish is HTML that you will need to parse to collect the data you're looking for. Look at HTMLParser in the python standard library, or install a library like lxml or BeautifulSoup, which have a little nicer API. You can also just try parsing the text by hand using str.split.

If you want to convert the table's rows into python lists you'll need to find all the rows, which look like this:

  <tr Valign="top">
    <td align="left">Bankers Standard <a href='http://interactive.web.insurance.ca.gov/companyprofile/companyprofile?event=companyProfile&doFunction=getCompanyProfile&eid=5906'><small>(Info)</small></a></td>
    <td align="left"><div align="right">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;N/A</td>
    <td align="left"><div align="right">250</div></td>
    <td align="left">&nbsp;</td>
    <td align="left">Bankers Standard <a href='http://interactive.web.insurance.ca.gov/companyprofile/companyprofile?event=companyProfile&doFunction=getCompanyProfile&eid=5906'><small>(Info)</small></a></td>
    <td align="left"><div align="right">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1255</td>
    <td align="left"><div align="right">500</div></td>
  </tr>

You want to loop over all the <tr> (row) elements, grabbing all the <td> (column) elements inside each row, then clean up the text in each column (removing those   spaces, etc.).

There are lots of questions on StackOverflow and tutorials on the internet on how to parse or scrape HTML in python, like this or this.

could you explain why we had to do the data.encode line

Sure! In the documentation for urlopen, it says:

data must be a bytes object specifying additional data to be sent to the server, or None if no such data is needed.

The urlencode function returns a unicode string, and if we try to pass that into urlopen, we get this error:

TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

So we use data.encode('utf8') to convert the unicode string to bytes. You typically need to use bytes for input & output like reading from or writing to files on disk, sending or receiving data over the network like HTTP requests, etc. This presentation has a good explanation of bytes vs. unicode strings in python and why you need to decode/encode when doing I/O.