I'm trying to learn python, so I decided to write a script that could translate something using google translate. Till now I wrote this:
import sys
from BeautifulSoup import BeautifulSoup
import urllib2
import urllib
data = {'sl':'en','tl':'it','text':'word'}
request = urllib2.Request('http://www.translate.google.com', urllib.urlencode(data))
request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11')
opener = urllib2.build_opener()
feeddata = opener.open(request).read()
#print feeddata
soup = BeautifulSoup(feeddata)
print soup.find('span', id="result_box")
print request.get_method()
And now I'm stuck. I can't see any bugs in it, but it still doesn't work (by that I mean that the script will run, but it wont translate the word).
Does anyone know how to fix it? (Sorry for my poor English)
Google translate is meant to be used with a GET
request and not a POST
request. However, urrllib2
will automatically submit a POST
if you add any data to your request.
The solution is to construct the url with a querystring so you will be submitting a GET
.
You'll need to alter the request = urllib2.Request('http://www.translate.google.com', urllib.urlencode(data))
line of your code.
Here goes:
querystring = urllib.urlencode(data)
request = urllib2.Request('http://www.translate.google.com' + '?' + querystring )
And you will get the following output:
<span id="result_box" class="short_text">
<span title="word" onmouseover="this.style.backgroundColor='#ebeff9'" onmouseout="this.style.backgroundColor='#fff'">
parola
</span>
</span>
By the way, you're kinda breaking Google's terms of service; look into them if you're doing more than hacking a little script for training.
requests
I strongly advise you to stay away from urllib if possible, and use the excellent requests
library, which will allow you to efficiently use HTTP
with Python.