I was trying to figure out how to deal with the cryptic "__VIEWSTATE" & Cie when you are trying to request (POST) a page with Python. It is the main source of a lot of problems in my scripts ... and I realize, when I was looking for answers / solutions, that you don't have that much of solutions (almost 0 !).
In this topic: Unable to load ASP.NET page using Python urllib2 You can see that I am giving my solution with consist in parsing the value of the cryptic fields every time you load the page ... That works, but that's quite stupid actually :-)
for result in the_page.findAll('input', attrs={'name' : '__VIEWSTATE'}):
view_state = result['value']
for result_1 in the_page.findAll('input', attrs={'name' : '__EVENTVALIDATION'}):
event_validation = result_1['value']
for result_2 in the_page.findAll('input', attrs={'name' : '__PREVIOUSPAGE'}):
previous_page = result_2['value']
for result in the_page.findAll('input', attrs={'name' : '__EVENTTARGET'}):
event_target = result['value']
And after :
url = 'http://bandscore.ielts.org/search.aspx'
values = {
'__EVENTTARGET' : 'gdvSearchResults',
'__EVENTARGUMENT' : page,
'__VIEWSTATE' : view_state,
'__PREVIOUSPAGE' : previous_page,
'__EVENTVALIDATION' : event_validation,
'DropDownList1' : Country,
#'txtSearchInstitution' : '',
#'hdnSearchText' : '',
#'rdoFilter': '%25',
}
user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
thePage = response.read()
the_page = soup(thePage)
So here few more links with good explanations / some are proposing solutions:
What does the __VIEWSTATE hold?
http://aspalliance.com/articleViewer.aspx?aId=135&pId=
http://msdn.microsoft.com/en-us/library/system.web.ui.losformatter.aspx
http://weblogs.asp.net/infinitiesloop/archive/2006/08/03/Truly-Understanding-Viewstate.aspx
http://msdn.microsoft.com/en-us/library/ms972976.aspx
Mechanize does not see some hidden form inputs?
Unable to load ASP.NET page using Python urllib2
I realize that a lot of people are trying to find a good way to deal with that, so let's try to find a good solution, all together ;-)
EDIT1: Found that too, might be interesting http://code.google.com/p/peekviewstate/source/browse/trunk/src/peekviewstate_example.py
(Sorry is this post in not perfect / full of good info ... I'm quite a n00b but I try hard)
How to deal with it? Just think of __VIEWSTATE
as opaque data sent to you by the server. It contains some specific data for given page and state of it's objects, and I don't really recommend you to modify it.
If you want to emulate using browser for some ASP.NET application, you need to include those ones in POST
request, so the server can reconstruct page's state.
What are exact problem it is causing? I think that your solution is pretty straightforward.
Btw, just on a side note - lot of ASP.NET application contains public API, which can be used instead of trying to parse it's pages.