Search code examples
pythonlistconcatenation

Concatenating similar items in a list - Python


I have a list of similar and unique words. The similar words are appeared in one string and are separated by "|".

input = ["car | cat", "cat | caat", "car | caar", "dog", "ant | ants"]

I want to get the following output so that we could find out car, cat, caat, and caar are all similar instead of having pairs of similar words that have been repeated. So target output is like this:

output= ["car | cat | caat | caar", "dog" , "ant | ants"]

So far, I've managed to get ["car | cat | caat | caar", "dog", "ant", "ants"]. But I want to keep "ant | ants" intact since it doesn't have any word in common with any other pairs.

Is someone able to write a python code to solve this problem?

Edit:

Here is the code to my attempt but I don't want to make you feel that you should use the same approach.

def concat_common_words(input):
    my_list = input
    split_my_list = [x.split(" | ") for x in my_list]

    flat_my_list = [i for j in split_my_list for i in j]

    count_my_list = Counter(flat_my_list)

    common = [k for k, v in count_my_list.items() if v > 1]

    target_my_list = [x for x in my_list if any(c in x for c in common)]

    flat_target_my_list = set(sf for sfs in target_my_list for sf in sfs.split(" | "))

    merged = [" | ".join(flat_target_my_list)] \
    + list(set(flat_my_list) - flat_target_my_list) 

    return merged
concat_common_words(["car | cat", "cat | caat", "car | caar", "dog", "ant | ants"])

It returns ["car | cat | caat | caar", "dog" , "ant" , "ants"] . But as I mentioned, I ant to keep "ant | ants" intact.


Solution

  • # I would create a set() for each group e.g. car | cat
    # when adding a new group I would then merge with any existing group if
    # they intersect.
    
    data = ["car | cat", "cat | caat", "car | caar", "dog", "ant | ants"]
    
    
    groups = []
    for item in data:
        words = set(item.split(" | "))
        to_remove = []
        for existing_group in groups:
            if words.intersection(existing_group):
                words.update(existing_group)
                to_remove.append(existing_group)
        for removal in to_remove:
            groups.remove(removal)
        groups.append(words)
    
    # convert groups back to pipe separated
    final_groups = [" | ".join(group) for group in groups]