I have a list of strings and I want to find popular prefixes. The prefixes are special in that they occur as strings in the input list.
I found a similar question here but the answers are geared to find the one most common prefix: Find *most* common prefix of strings - a better way?
While my problem is similar, it differs in that I need to find all popular prefixes. Or to maybe state it a little simplistically, rank prefixes from most common to least.
As an example, consider the following list of strings: in, india, indian, indian flag, bull, bully, bullshit
Prefixes rank: in - 4 times india - 3 times bull - 3 times ...and so on. Please note - in, bull, india are all present in the input list.
The following are not valid prefixes: ind bu bul ...since they do not occur in the input list.
What data structure should I be looking at to model my solution? I'm inclined to use a "trie" with a counter on each node that tracks how many times has that node been touched during the creation of the trie.
All suggestions are welcome. Thanks.
p.s. - I love python and would love if someone could post a quick snippet that could get me started.
words = [ "in", "india", "indian", "indian", "flag", "bull", "bully", "bullshit"]
Result = sorted([ (sum([ w.startswith(prefix) for w in words ]) , prefix ) for prefix in words])[::-1]
it goes through every word as a prefix and checks how many of the other words start with it and then sorts the result. the[::-1] simply reverses that order