Search code examples
pythonalgorithmprefixinformation-theoryprefixes

Make a set prefix-free


Is there a standard or best algorithm to make a given set of strings prefix-free? That is, given a set of strings, throw out all strings that have a (shorter) prefix also in that set.

In case it matters, I'm ultimately gonna implement this in Python 2.7.


Solution

  • strings = ['a', 'apple', 'b', 'beta', 'c', 'd']
    
    def prefices_only(strlist):
        ordered = sorted(strlist)
        last = ordered[0]
        results = [last]
    
        for c in ordered:
            if not c.startswith(last):
                last = c
                results.append(c)
    
        return results
    
    print(prefices_only(strings))