Search code examples
pythonlistunicode-string

Removing duplicates from the list of unicode strings


I am trying to remove duplicates from the list of unicode string without changing the order(So, I don't want to use set) of elements appeared in it.

Program:

result = [u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
result.reverse()
for e in result:
    count_e = result.count(e)
    if count_e > 1:
        for i in range(0, count_e - 1):
            result.remove(e)
result.reverse()
print result

Output:

[u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']

Expected Output:

[u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']

So, Is there any way of doing it simple as possible.


Solution

  • You actually don't have duplicates in your list. One time you have http://catb.org while another time you have http://www.catb.org.

    You'll have to figure a way to determine whether the URL has www. in front or not.