Given a set of 1 million (very large) no. of URL's. Find the "first" "unique" URL from the list.
My Approach: Build a hash using perfect hashing function, that can help. But my question is to hash large data is not possible., then how can I solve this question.
Is there any method to do inplace?
Please help. Thanks in advance.
Given an input list of ["c","a","b","a","c"]
, my first approach would be:
- Convert the list of URLs into a list of tuples which associates each element which its position in the list. Now you have
[(0,"c"),(1,"a"),(2,"b"),(3,"a"),(4,"c")]
.
- Sort the list lexicographically by the second tuple element (the URL). Now you have
[(1,"a"),(3,"a"),(2,"b"),(0,"c"),(4,"c")]
.
- Group sequences of subsequent equal tuples (a tuple is equal if the second element equals) into sub-lists. Now you have
[[(1,"a"),(3,"a")],[(2,"b")],[(0,"c"),(4,"c")]]
.
- Filter the list so that you only have lists of length 1. Now you have
[[(2,"b")]]
.
- If the resulting list is empty, there is no unique URL in the list. If it is non-empty, Sort the list by the first tuple element (the position in the string). In this case, you get the same list back -
[[(2,"b")]]
.
- take the first element of the list. Now you have
[(2,"b")]
.
- The (only) tuple in this list tells you the first unique URL, and the position in the input list: it's the URL
b
at position 2
in the input list.