python algorithm data-structures ordereddictionary

finding n largest differences between two lists

I have two lists old and new, with the same number of elements.

I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.

I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?

Solution

Whenever you think "n largest", think heapq.

>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]

This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.

It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:

>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]