Search code examples
pythonpattern-matchinglzw

Pattern finding LZW python


LZW algorithm is used to find patterns between input symbols. But can it seek pattern among words ? I mean the alfabet index not to be symbols but words for example for the input :

'abcd', 'abcd', 'fasf' , 'asda', 'abcd' , 'fasf' ...

to have an output like :

'abcd', '1', 'fasf' , 'asda' , '1', '2' ...

Or is there any compressing algorithm that does the trick ?


Solution

  • keys = []
    def lzw(text):
          tokens = text.split()
          new_keys = dict.fromkeys(tokens).keys()
          keys.extend([key for key in new_keys if  key not in keys])
          encoded = ["%s"%keys.index(tok) for tok in tokens]
          for i,key in enumerate(keys):
               try:
                  encoded[encoded.index(str(i))] = key
               except:
                   pass
          return " ".join(encoded)
    
    print lzw("abcd abcd fasf asda abcd fasf")
    #outputs: abcd 0 fasf asda 0 2
    

    is a pretty easy implementation