Search code examples
pythonnested-lists

Find elegant solution for finding same items that follow each other in two lists


I have multiple lists in list and I want to find out if any of them is at least a partial match.

Example:

list_of_lists = [["a", "b", "c"], ["z", "a", "b"], ["y", "b", "c"], ["z", "a"]]

The desired output should be:

["a", "b"] # because first and second list
["b", "c"] # because first and third list
["z", "a"] # because second and last list

❗The order and duplicated matters. So I cannot use set. it chance to have desired output like that: ["a", "b", "a"]

I can probably loop over every item in 3 or more for cycles but for me is totally overkill and when I have 100 lists in list it will be really slow.

if exist any pandas or numpy function or something in Python that to do more efficiently I would appreciate it.


Solution

  • I suspect this isn't the highest-performance solution possible, but found a method that appears to work. Check how it works with your lists and drop a comment if it needs tweaking.

    It combines the lists into a comma-delimited string then uses Regex to search for repeating sequences.

    import re
    list_of_lists = [["a", "b", "c"], ["z", "a", "b"], ["y", "b", "c"], ["z", "a"]]
    search=re.compile(r'([a-z]{2,}).+?,?.+?\1') #
    text=','.join([''.join(s) for s in list_of_lists])
    i=0
    matches=[]
    results=[]
    while len(text)>0:
        if result := search.search(text):
            if (span := tuple(x+i for x in result.span())) not in matches:
                matches.append(span)
                results.append(list(result.group(1)))
        text=text[1:]
        i+=1
    results