Search code examples
pythonfilenamesurlliburlparse

Url Parse is missing fragment - Python


I need to save a file with the name of the given acquisition path's file.

Given an URL I would like to parse it and extract the name of the file, here's my code...

I read a JSON parameter and give it to the Parse Url function. The acquisition path is a string.

ParseUrl.py:

from urllib.parse import urlparse as up
a = up(jtp["AcquisitionPath"])    # => http://127.0.0.1:8000/Users/YodhResearch/Desktop/LongCtrl10min.tiff
print(a)
print(os.path.basename(a))

Result:

ParseResult(scheme='http', netloc='127.0.0.1:8000', path='/Users/YodhResearch/Desktop/LongCtrl10min.tiff', params='', query='', fragment='')
[....]
TypeError: expected str, bytes or os.PathLike object, not ParseResult

As you can see it Parse the URL but "LongCtrl10min.tiff" is not in the fragment section but is all on the path section. Why is that happening? Maybe because "AcquisitionPath" is a string and UrlParse recognize all as a unique path?

EDIT:

a.path WORKS, I would like to know why I don't get it into the fragment section.

Here's another example:

from urllib.parse import urlparse as up

string = "http://127.0.0.1:8000/GIULIO%20FERRARI%20FOLDER/Giulio%20_%20CSV/Py%20Script/sparse%20python/tiff_test.tiff_IDAnal#1_IDAcq#10_TEMP_.json"

a = up(string)
print(a)
print(os.path.basename(a))

Results:

ParseResult(scheme='http', netloc='127.0.0.1:8000', path='/GIULIO%20FERRARI%20FOLDER/Giulio%20_%20CSV/Py%20Script/sparse%20python/tiff_test.tiff_IDAnal', params='', query='', fragment='1_IDAcq#10_TEMP_.json')

See, Now it doesn't get the right fragment that should be: "tiff_test.tiff_IDAnal#1_IDAcq#10_TEMP_.json"

SOLUTION:

Fragment needs '#' symbol! Thanks to all.


Solution

  • There are two issues here: how to identify the components of a URL, and how to create the desired path from those components.


    First, you are confused over what the fragment actually is. From RFC 3986:

    The following are two example URIs and their component parts:
    
             foo://example.com:8042/over/there?name=ferret#nose
             \_/   \______________/\_________/ \_________/ \__/
              |           |            |            |        |
           scheme     authority       path        query   fragment
              |   _____________________|__
             / \ /                        \
             urn:example:animal:ferret:nose
    

    The fragment is only the portion following the #, not the entire final component of the path.


    Second, the urlparse()function from urllib module returns a ParseResult object and the basename()-method from os.path wants a str as argument.

    What you probably want is to get the path from the ParseResult-object. You will get this with a.path (the path you have given via urlparse is saved in the attribute path of the ParseResult-object).

    from urllib.parse import urlparse as up
    a = up("http://127.0.0.1:8000/Users/YodhResearch/Desktop/LongCtrl10min.tiff")
    print(os.path.basename(a.path))
    

    This will output:

    LongCtrl10min.tiff

    If you want to include also the fragments, you can do this by explicitly adding this. The fragments are saved in a separated attribute in the ParseResult object, i.e. a.fragment in your case:

    from urllib.parse import urlparse as up 
    a = up("http://127.0.0.1:8000/Users/YodhResearch/Desktop/LongCtrl10min.tiff#anyfragment") 
    print(os.path.basename(a.path) + "#" + a.fragment)                     
    

    will output:

    LongCtrl10min.tiff#anyfragment