Pattern finding LZW python

LZW algorithm is used to find patterns between input symbols. But can it seek pattern among words ? I mean the alfabet index not to be symbols but words for example for the input :

'abcd', 'abcd', 'fasf' , 'asda', 'abcd' , 'fasf' ...

to have an output like :

'abcd', '1', 'fasf' , 'asda' , '1', '2' ...

Or is there any compressing algorithm that does the trick ?

Solution

keys = []
def lzw(text):
      tokens = text.split()
      new_keys = dict.fromkeys(tokens).keys()
      keys.extend([key for key in new_keys if  key not in keys])
      encoded = ["%s"%keys.index(tok) for tok in tokens]
      for i,key in enumerate(keys):
           try:
              encoded[encoded.index(str(i))] = key
           except:
               pass
      return " ".join(encoded)

print lzw("abcd abcd fasf asda abcd fasf")
#outputs: abcd 0 fasf asda 0 2

is a pretty easy implementation