Search code examples
pythonlistalgorithm

Merging items in list given condition


Let's say I have ['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G'].

If the second word in each element of the list is same as first word in any other elements in the list, they should be merged into one item. The order matters as well.

['A B C D E G', 'X Y Z'] should be the final product.

Letters will not form a cycle, i.e., ['A B', 'B C', 'C A'] is not possible.

As for why G was added at the end, let's say we are at 'A B C' and we see 'C D'. And later on in the list we have 'C G'. We add 'C D' since it comes first, so we have 'A B C D', and then if there is 'D E' we merge that to 'A B C D E'. Now at the end of a 'chain', we add 'C G' so we have 'A B C D E G'. If there had been a 'B H', since B comes first, then it would be 'A B C D E H G'.

Since order matters, ['A B', 'C A'] is still ['A B', 'C A'] since B does not connect to C.

Another example is:

If we have ['A B', 'A C'] then at step 1 we just have A B and then we see A C and we merge that into A B and we have A B C.

I have tried many things, such as dissecting each element into two lists, then indexing and removing etc. It is way too complex and I feel there should be a more intuitive way to solve this in Python. Alas, I can't seem to figure it out.


Solution

  • A simple algorithm solving this appears to be:

    • initialize results as empty list
    • repeat for each pair in input list:
      • repeat for each sublist R in results:
        • if R contains the first item of pair, append second item to R and continue with next pair
      • if no sublist R contained the first item of pair, append pair as new sublist to results

    The implementation in Python is straightforward and should be self-explanatory:

    def merge_pairs(pairs):
        results = []
        for pair in pairs:
            first, second = pair.split()
            for result in results:
                if first in result:
                    result.append(second)
                    break
            else:
                results.append([first, second])
        return [' '.join(result) for result in results]
    

    The only extra steps are the conversions between space-separated letters and lists of letters by .split() and ' '.join().

    Note the else clause of the for statement which is only executed if no break was encountered.

    Some examples:

    >>> merge_pairs(['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G'])
    ['A B C D E G', 'X Y Z']
    >>> merge_pairs(['A B', 'A C'])
    ['A B C']
    >>> merge_pairs(['B C', 'A B'])
    ['B C', 'A B']