I have multiple lists in list and I want to find out if any of them is at least a partial match.
Example:
list_of_lists = [["a", "b", "c"], ["z", "a", "b"], ["y", "b", "c"], ["z", "a"]]
The desired output should be:
["a", "b"] # because first and second list
["b", "c"] # because first and third list
["z", "a"] # because second and last list
❗The order and duplicated matters. So I cannot use set
. it chance to have desired output like that: ["a", "b", "a"]
I can probably loop over every item in 3 or more for cycles but for me is totally overkill and when I have 100 lists in list it will be really slow.
if exist any pandas or numpy function or something in Python that to do more efficiently I would appreciate it.
I suspect this isn't the highest-performance solution possible, but found a method that appears to work. Check how it works with your lists and drop a comment if it needs tweaking.
It combines the lists into a comma-delimited string then uses Regex to search for repeating sequences.
import re
list_of_lists = [["a", "b", "c"], ["z", "a", "b"], ["y", "b", "c"], ["z", "a"]]
search=re.compile(r'([a-z]{2,}).+?,?.+?\1') #
text=','.join([''.join(s) for s in list_of_lists])
i=0
matches=[]
results=[]
while len(text)>0:
if result := search.search(text):
if (span := tuple(x+i for x in result.span())) not in matches:
matches.append(span)
results.append(list(result.group(1)))
text=text[1:]
i+=1
results