Search code examples
pythonbashurlparse

Unable to parse Url with python urlparse


I am trying to write a small script that will take url as input and will parse it.

Following is my script

#! /usr/bin/env python

import sys

from urlparse import urlsplit
url = sys.argv[1]
parseUrl = urlsplit(url)
print 'scheme  :', parseUrl.scheme
print 'netloc  :', parseUrl.netloc

But when I execute this script with ./myscript http://www.example.com

it shows following error.

AttributeError: 'tuple' object has no attribute 'scheme'

I am new to python/scripting, where am I doing wrong?

Edit: Python version that I am using is Python 2.7.5


Solution

  • You don't want scheme. Instead in this case you want to access the 0 index of the tuple and the 1 index of the tuple.

    print 'scheme  :', parseUrl[0]
    print 'netloc  :', parseUrl[1]
    

    urlparse uses the .scheme and .netloc notation, urlsplit instead uses a tuple (refer to the appropriate index number):

    This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted. A separate function is needed to separate the path segments and parameters. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

    The return value is actually an instance of a subclass of tuple. This class has the following additional read-only convenience attributes:

    Attribute Index   Value                               Value if not present
    scheme      0       URL scheme specifier                empty string
    netloc      1       Network location part               empty string
    path        2       Hierarchical path                   empty string
    query       3       Query component                     empty string
    fragment    4       Fragment identifier                 empty string
    username            User name                           None
    password            Password                            None
    hostname            Host name (lower case)              None
    port                Port number as integer, if present  None