Search code examples
pythonurlpython-2.7urlparse

Python: How to check if a string is a valid IRI?


Is there a standard function to check an IRI, to check an URL apparently I can use:

parts = urlparse.urlsplit(url)  
    if not parts.scheme or not parts.netloc:  
        '''apparently not an url'''

I tried the above with an URL containing Unicode characters:

import urlparse
url = "http://fdasdf.fdsfîășîs.fss/ăîăî"
parts = urlparse.urlsplit(url)
if not parts.scheme or not parts.netloc:  
    print "not an url"
else:
    print "yes an url"

and what I get is yes an url. Does this means I'm good an this tests for valid IRI? Is there another way ?


Solution

  • Using urlparse is not sufficient to test for a valid IRI.

    Use the rfc3987 package instead:

    from rfc3987 import parse
    
    parse('http://fdasdf.fdsfîășîs.fss/ăîăî', rule='IRI')