Search code examples
pythoncomparisondataset

Number of elements in Python Set


I have a list of phone numbers that have been dialed (nums_dialed). I also have a set of phone numbers which are the number in a client's office (client_nums) How do I efficiently figure out how many times I've called a particular client (total)

For example:

>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4

Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.


Solution

  • which client has 10^5 numbers in his office? Do you do work for an entire telephone company?

    Anyway:

    print sum(1 for num in nums_dialed if num in client_nums)
    

    That will give you as fast as possible the number.


    If you want to do it for multiple clients, using the same nums_dialed list, then you could cache the data on each number first:

    nums_dialed_dict = collections.defaultdict(int)
    for num in nums_dialed:
        nums_dialed_dict[num] += 1
    

    Then just sum the ones on each client:

    sum(nums_dialed_dict[num] for num in this_client_nums)
    

    That would be a lot quicker than iterating over the entire list of numbers again for each client.