Search code examples
pythondjangourl-validation

django URLValidator produced bogus errors


I'm using the Django URLValidator in the following way in a form:

def clean_url(self):
    validate = URLValidator(verify_exists=True)
    url = self.cleaned_data.get('url')

    try:
        logger.info(url)
        validate(url)
    except ValidationError, e:
        logger.info(e)
        raise forms.ValidationError("That website does not exist. Please try again.")

    return self.cleaned_data.get('url')

It seems to work with some url's but for some valid ones, it fails. I was able to check with http://www.amazon.com/ it's failing (which is obviously incorrect). It passes with http://www.cisco.com/. Is there any reason for the bogus errors?


Solution

  • Look at the source for URLValidator; if you specify check_exists, it makes a HEAD request to the URL to check if it's valid:

    req = urllib2.Request(url, None, headers)
    req.get_method = lambda: 'HEAD'
    ...
    opener.open(req, timeout=10)
    

    Try making the HEAD request to Amazon yourself, and you'll see the problem:

    carl@chaffinch:~$ HEAD http://www.amazon.com
    405 MethodNotAllowed
    Date: Mon, 13 Aug 2012 18:50:56 GMT
    Server: Server
    Vary: Accept-Encoding,User-Agent
    Allow: POST, GET
    ...
    

    I can't see a way of solving this other than monkey-patching or otherwise extending URLValidator to use a GET or POST request; before doing so, you should think carefully about whether to use check_exists at all (without which this problem should go away). As core/validators.py itself says,

    "The URLField verify_exists argument has intractable security and performance issues. Accordingly, it has been deprecated."

    You'll find that the in-development version of Django has indeed disposed of this feature completely.