I would like to do the following.
1) Goto https://www.wunderground.com/history
2) Submit the form with the following values:
Location = 'Los Angeles, CA'
Month = 'November'
Day = '02'
Year = '2017'
3) Retrieve the response and save to a file
Here is what I have at the moment, but does not work:
import requests
url = 'https://www.wunderground.com/history'
payload = {'code': 'Los Angeles, CA', 'month': 'November', 'day': '2', 'year':'2017'}
r = requests.post(url, params=payload)
with open('test.html', 'w') as f:
f.write(r.text)
I'm not getting the expected response and not sure If I am using requests properly or not.
I know there's an API from wunderground but prefer not to use it at the moment.
The content of test.html is basically the original page with no historical data.
I'm expecting this page:
You cannot blindly send payloads to some websites and expect a good result. First, look at the source code of the form
element. I removed some unimportant parts:
<form action="/cgi-bin/findweather/getForecast" method="get" id="trip">
<input type="hidden" value="query" name="airportorwmo" />
<input type="hidden" value="DailyHistory" name="historytype" />
<input type="hidden" value="/history/index.html" name="backurl" />
<input type="text" value="" name="code" id="histSearch" />
<select class="month" name="month">
<option value="1">January</option>
<option value="2">February</option>
...
<option value="12">December</option>
</select>
<select class="day" name="day">
<option>1</option>
<option>2</option>
...
<option>31</option>
</select>
<select class="year" name="year">
<option>2018</option>
<option>2017</option>
...
<option>1945</option>
</select>
<input type="submit" value="Submit" class="button radius" />
</form>
First, from the method
attribute of the form
element you can see that you have to use the GET method, not POST, to send the payload. Second, from the action
attribute you can also see that you should send that payload to this specific URL:
https://www.wunderground.com/cgi-bin/findweather/getForecast
The payload itself are not just values you want to send. In many cases there are additional values that has to be sent in order for the web server to respond correctly. It is usually best to either send everything (basically every name
attribute) or inspect what the website actually sends.
This code works for me:
import requests
URL = 'https://www.wunderground.com/cgi-bin/findweather/getForecast'
CODE = 'Los Angeles, CA'
DAY = 2
MONTH = 11
YEAR = 2017
params = {
'airportorwmo': 'query',
'historytype': 'DailyHistory',
'backurl': '/history/index.html',
'code': CODE,
'day': DAY,
'month': MONTH,
'year': YEAR,
}
r = requests.get(URL, params=params)
print(r.text)