Search code examples
pythonurlflickrurl-shortener

Unshorten Flic.kr URLs


I have a Python script that unshortens URLs based on the answer posted here. So far it worked pretty well, e.g., with youtu.be, goo.gl,t.co, bit.ly, and tinyurl.com. But now I noticed that it doesn't work for Flickr's own URL shortener flic.kr.

For example, when I enter the URL

https://flic.kr/p/qf3mGd

into a browser, I get redirected correctly to

https://www.flickr.com/photos/106783633@N02/15911453212/

However, when using to unshorten the same URL with the Python script I get the following re-directs

https://flic.kr/p/qf3mgd
http://www.flickr.com/photo.gne?short=qf3mgd
http://www.flickr.com/signin/?acf=%2Fphoto.gne%3Fshort%3Dqf3mgd
https://login.yahoo.com/config/login?.src=flickrsignin&.pc=8190&.scrumb=[...]

thus eventually ending up on the Yahoo login page. Unshort.me, by the way, can unshorten the URL correctly. What am I missing here?

Here is the full source code of my script. I stumbled upon some pathological cases with the original script:

import urlparse
import httplib


def unshorten_url(url, max_tries=10):
    return __unshorten_url(url, [], max_tries)

def __unshorten_url(url, check_urls, max_tries):
    if max_tries == 0:
        if len(check_urls) > 0:
            return check_urls[0]
        return url
    if url in check_urls:
        return url
    unshortended = ''
    try:
        parsed = urlparse.urlparse(url)
        h = httplib.HTTPConnection(parsed.netloc)
        h.request('HEAD', url)
    except:
        return None
    try:
        response = h.getresponse()
    except:
        return url


    if response.status/100 == 3 and response.getheader('Location'):
        unshortended = response.getheader('Location')
    else:
        return url
    #print max_tries, unshortended
    if unshortended != url:
        if 'http' not in unshortended:
            return url
        check_urls.append(url)
        return __unshorten_url(unshortended, check_urls, (max_tries-1))
    else:
        return unshortended

print unshorten_url('http://t.co/5skmePb7gp')

EDIT: Full working example with a t.co URL


Solution

  • I'm using Request [0] rather than httplib in this way and it's works fine with https://flic.kr/p/qf3mGd like urls:

    >>> import requests
    >>> requests.head("https://flic.kr/p/qf3mGd", allow_redirects=True, verify=False).url
    u'https://www.flickr.com/photos/106783633@N02/15911453212/'
    

    [0] http://docs.python-requests.org/en/latest/