Search code examples
sortingdata-structureshashmaptime-complexitytrie

Hash-maps or search tree?


The problem is as follows: Given is a list of cities and their countries, population and geo-coordinates. You should read this data, save it and answer it in an endless loop of the following type:

Request: a prefix (e.g., free).

Answer: all states beginning with this prefix ("case-insensitive") and their associated data (country + population + geo-coordinates). The cities should be sorted by population (highest population first).

  • Which data structure are the most suitable for the described problem ?

First Part : My Thoughts are hanging between Trie and Hashmap. Although i tend to the Trie more because i'm dealing with prefix requests , and Trie is basically according to Wikipedia :

"a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings".

in addition to that in terms of Storage and reading data Trie has the advantage over Hash-maps.

Second part: returning the sorted cities by population would be a little bit challenging when we speak about Time Complexity.If i'm thinking in the right direction i should save the values of the keys as lists and it will be easier to sort just the returning list , so i don't have to save it sorted to save some times.

Please share you thoughts and correct me if i'm wrong .


Solution

  • There are pros of cons of picking vanilla tries and vanilla hashmaps. In general, for autocomplete systems, the structure of a trie is extremely useful because you're usually searching for prefixes and the user would like to see the words that begin with the string that they have just entered.

    However, there is a method to make the best use of both of these data structures, it is called a Hash Trie (implementation: http://www.sanfoundry.com/java-program-implement-hash-trie/). So the way you would implement this is by using the structure of the trie, but the final node is the actual string it refers to. In python, this is done using dictionaries instead of lists while implementing the trie.

    For the second half of the question, a list would be your best bet, in essence a list of tuples (population, city) and sort by the population and return the cities. Regarding it being "easier" to sort, I'm not sure if I agree with this, easy is a relevant term and there's really no way of saying that it's easier than, maybe storing it in a tree and then returning the Pre-Order Traversal of the tree. Essentially, if you're using comparison based sort, it won't get better than nlog (n).