Search code examples
pythonnltkproxy-server

NLTK: set proxy server


I'm trying to learn NLTK - Natural Language Toolkit written in Python and I want install a sample data set to run some examples.

My web connection uses a proxy server, and I'm trying to specify the proxy address as follows:

>>> nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))
>>> nltk.download()

But I get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

I decided to set up a ProxyBasicAuthHandler before calling nltk.download():

import urllib2

auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm())
auth_handler.add_password(realm=None, uri='http://proxy.example.com:3128/', user='USERNAME', passwd='PASSWORD')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)

import nltk
nltk.download()

But now I get HTTP Error 407 - Proxy Autentification Required.

The documentation says that if the proxy is set to None then this function will attempt to detect the system proxy. But it isn't working.

How can I install a sample data set for NLTK?


Solution

  • There is an error with the website where you got those lines of code for your first attempt (I have seen that same error)

    The line in error is

    nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))
    

    You need a comma to separate the arguments. The correct line should be

    nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
    

    This will work just fine.