Search code examples
pythonperformancecpu-speed

program speed/unnecessary double-counting


I am trying to find the number of pairs in a list of numbers with a specific difference. Say, with the list

1 2 3 4 5

and the difference target '2', I would want to print the number '3' because there are 3 pairs in this sequence with a difference of '2'. however, my code is super slow - it double-counts all of the pairs, and so I end up needing to divide my solutions by 2 to get the answer. Is there a way to accomplish this same task without double-counting? I appreciate any insights you might have. thanks! code is printed below

    import sys


    def main():
        solutions=0
        pairs=[]
        for i in xrange(len(numbers)):
            for j in xrange(len(numbers)):
                if i!=j:
                    pairs.append([numbers[i], numbers[j]])

        for pair in pairs:
            if abs(pair[0]-pair[1])==k:
                solutions+=1
            else:
                continue
        return solutions/2



    if __name__ == '__main__':
        lines=sys.stdin.readlines()
        n,k=map(int, lines[0].strip().split())
        numbers=map(int, lines[1].strip().split())
        print main()

Solution

  • For each element i in a, you want to check whether i-diff is also in a. For ~O(1) membership testing, we can use a set. Thus:

    >>> a = [1,2,3,4,5]
    >>> diff = 2
    >>> a_set = set(a)
    >>> sum(i-diff in a_set for i in a_set)
    3
    

    which is O(len(a)).

    [Note that I've used the fact that i-diff in a_set, which is a bool, evaluates to 1 as an int. This is equivalent to sum(1 for i in a_set if i-diff in a_set).]

    Update: it occurs to me that I've assumed that the numbers are unique. If they're not, that's okay, we could just use a collections.Counter instead to keep the multiplicity information.