Search code examples
pythonalgorithmedit-distance

How to fit strings using spaces, minimizing edit distance?


I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them:

fit('algorithm', 'lgrthm') == ' lg r thm'

There sure must be some prewritten algorithm for this. Any ideas?


Solution

  • You could do something like the following:

    def fit(target, source):
        i, j = 0, 0
        result = []
        while i < len(source) and j < len(target):
            if source[i] == target[j]:
                result.append(source[i])
                i += 1
            else:
                result.append(' ')
            j += 1
    
        return ''.join(result)
    
    
    test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
            ('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]
    
    for t, s in test:
        print(t)
        print(fit(t, s))
        print('---')
    

    Output

    algorithm
     lg r thm
    ---
    pineapple
    pine
    ---
    pineapple
        apple
    ---
    pineapple
       ea  le
    ---
    foo
    fo
    ---
    stack
    s a k
    ---
    over
    o  r
    ---
    flow
     l w
    ---
    

    A perhaps better version, is the following:

    from collections import deque
    
    
    def peak(q, default=' '):
        """Perform a safe peak, if the queue is empty return default"""
        return q[0] if q else default
    
    
    def fit(target, source):
        ds = deque(source)
        return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])
    

    Is better in the sense that you do not need to keep track of state variables i, j like in the previous approach.