Search code examples
javaformsgroovycheckboxhtmlunit

HtmlUnit - Selecting Forms, CheckBoxes, TextFields, and Submit Buttons


I have been messing around with HtmlUnit for a little bit and particularly this website because it has quite a few features that I wanted to get used to. I have posted about it before but that was mainly for grabbing information off the site which ended up successful. Now I am wanting to fill in a form and submit it.

Current Test Code:

def url = "http://www.hidemyass.com/proxy-list/"


client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false

page = client.getPage(url)
form = page.getFormByName("proxyform")

//get portInputField and set value
portField = form.getInputByName("p")
portField.setValueAttribute("80")

//select checkbox 1 & 2 from anonymity level
//click "Update Results"
//get new page url
//grab information
//save

The section commented out is where I am unsure of what to do. I went ahead and attempted but would like to ask for input on what I should be doing.

Attempt:

def url = "http://www.hidemyass.com/proxy-list/"

page = client.getPage(url)

portField = page.getHtmlElementById("ports").setValueAttribute("80")

submitButton = page.getByXPath("/html/body//form//input[@type='image']")
page2 = submitButton.get(0).click()

println page2    

The snippet above prints out: HtmlPage(http://www.hidemyass.com/proxy-list/search-1)@17168934

I'm looking to get a new page where I can then parse the information from the search. Any ideas?

I don't believe the language I am using should make too much of a difference; however, I am using Groovy.

EDIT

I managed to get what I wanted but it returns like so:

HtmlPage(http://www.hidemyass.com/proxy-list/search-1)@23713629
<?xml version="1.0" encoding="UTF-8"?><td>109.123.00.00</td>

Is there a way to get only the information that I'm looking for : <td>109.123.00.00</td> or do I just need to strip the info from it manually?

EDIT

.asText() solved my issue, but gave quite a few warnings regarding the CSS. Should I be worried?


Solution

  • Is there a way to get only the information that I'm looking for : 109.123.00.00 or do I just need to strip the info from it manually?

    This should work:

    def td = page2.getElementByName("td")
    assert td.textContent == "109.123.00.00"
    

    See the JavaDoc for HtmlPage for other ways to extract information from a page. Don't parse the page manually.

    Side note: Since you are already using Groovy, you could also have a look at Geb, a popular Groovy-based web automation and testing tool that's more convenient to use than HtmlUnit.