Search code examples
pythonarrayslistoverlap

Find longest overlap between python lists / numpy arrays (not intersection)


I'm wondering if there is an efficient way to find the longest (or any/every) overlap(s) between lists or numpy arrays (a built-in function, a list comprehension or other short method). I've read about using sets to find the intersection (the elements which occur in both) of lists/arrays, but I need to find (the longest/all) common sequence(s) (sub-lists/arrays) that occur in both. Moving towards other data types to help realize this is an option.

Example:

list1 = [1,2,3,5,4,6,9,8]
list2 = [3,8,2,3,5,4,1,2]

overlap = find_overlap(list1,list2) # function find_overlap TBD
# overlap == [2,3,5,4]

The intersection of both lists in the example would return [1,2,3,4,5,8]. This however returns all common elements, rather than the common sequence which I need.

I have tried to find built-in functions, with no success. Neither have I found similar questions being asked here. I could write an iterating or recursive function trying to achieve this, but I'm afraid of running into exploding computing times for large (>100.000 items) size lists/array.


Solution

  • You can use difflib.SequenceMatcher.find_longest_match:

    from difflib import SequenceMatcher
    
    list1 = [1,2,3,5,4,6,9,8]
    list2 = [3,8,2,3,5,4,1,2]
    match = SequenceMatcher(None, list1, list2).find_longest_match()
    print(list1[match.a: match.a + match.size])
    

    This outputs:

    [2, 3, 5, 4]