Search code examples
python-3.4urlparse

Python 3 : Why would you use urlparse/urlsplit


I'm not exactly sure what these modules are used for. I get that they split the respective url into its components, but why would that be useful, or what is an example of when to use urlparse?


Solution

  • Use urlparse only if you need parameter. I have explained below why do you need parameter for.

    Reference

    urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

    This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.

    Hostname is always useful to store in variable to use it later or adding parameter, query to hostname to get the web page you want while scraping.

    Regarding Parameter:

    FYI: According to RFC2396, parameter in url

    Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference.

    Parameter are useful in scraping, e.g. if the url is http://www.example.com/products/women?color=green

    When you use urlparse, you will get parameter. Now You have to change it to men so it will be http://www.example.com/products/men?color=green and kids, girl, boy so on.