I'm wondering if there is an efficient way to find the longest (or any/every) overlap(s) between lists or numpy arrays (a built-in function, a list comprehension or other short method). I've read about using sets to find the intersection (the elements which occur in both) of lists/arrays, but I need to find (the longest/all) common sequence(s) (sub-lists/arrays) that occur in both. Moving towards other data types to help realize this is an option.
Example:
list1 = [1,2,3,5,4,6,9,8]
list2 = [3,8,2,3,5,4,1,2]
overlap = find_overlap(list1,list2) # function find_overlap TBD
# overlap == [2,3,5,4]
The intersection of both lists in the example would return [1,2,3,4,5,8]. This however returns all common elements, rather than the common sequence which I need.
I have tried to find built-in functions, with no success. Neither have I found similar questions being asked here. I could write an iterating or recursive function trying to achieve this, but I'm afraid of running into exploding computing times for large (>100.000 items) size lists/array.
You can use difflib.SequenceMatcher.find_longest_match
:
from difflib import SequenceMatcher
list1 = [1,2,3,5,4,6,9,8]
list2 = [3,8,2,3,5,4,1,2]
match = SequenceMatcher(None, list1, list2).find_longest_match()
print(list1[match.a: match.a + match.size])
This outputs:
[2, 3, 5, 4]