Search code examples
pythonnumpydictionaryenumerate

enumerate in dictionary loop take long time how to improv the speed


I am using python-3.x and I would like to speed my code where in every loop, I am creating new values and I checked if they exist or not in the dictionary by using the (check if) then I will keep the index where it is found if it exists in the dictionary. I am using the enumerate but it takes a long time and it very clear way. is there any way to speed my code by using another way or in my case the enumerate is the only way I need to work with? I am not sure in my case using numpy will be better.

Here is my code:

# import numpy
import numpy as np

# my first array
my_array_1 = np.random.choice ( np.linspace ( -1000 , 1000 , 2 ** 8 ) , size = ( 100 , 3 ) , replace = True )
my_array_1 = np.array(my_array_1)




# here I want to find the unique values from my_array_1
indx = np.unique(my_array_1, return_index=True, return_counts= True,axis=0)


#then saved the result to dictionary
dic_t= {"my_array_uniq":indx[0], # unique values in my_array_1
       "counts":indx[2]} # how many times this unique element appear on my_array_1


# here I want to create random array 100 times 
for i in range (100):

    print (i)

    # my 2nd array
    my_array_2 = np.random.choice ( np.linspace ( -1000 , 1000 , 2 ** 8 ) , size = ( 100 , 3 ) , replace = True )
    my_array_2 = np.array(my_array_2)


#   I would like to check if the values in my_array_2 exists or not in the dictionary (my_array_uniq":indx[0])
#   if it exists then I want to hold the index number of that value in the dictionary and
#    add 1 to the dic_t["counts"], which mean this value appear agin and cunt how many.
#   if not exists, then add this value to the dic (my_array_uniq":indx[0])
#    also add 1 to the dic_t["counts"]
    for i, a in enumerate(my_array_2):

        ix = [k for k,j in enumerate(dic_t["my_array_uniq"]) if (a == j).all()]    
        if ix:

            print (50*"*", i, "Yes", "at", ix[0])     
            dic_t["counts"][ix[0]] +=1    

        else:
#            print (50*"*", i, "No")        
            dic_t["counts"] =  np.hstack((dic_t["counts"],1))
            dic_t["my_array_uniq"] = np.vstack((dic_t["my_array_uniq"], my_array_2[i]))

explanation:

1- I will create an initial array.
2- then I want to find the unique values, index and count from an initial array by using (np.unique).
3- saved the result to the dictionary (dic_t)
4- Then I want to start the loop by creating random values 100 times.
5- I would like to check if this random values in my_array_2 exist or not in the dictionary (my_array_uniq":indx[0])
6- if one of them exists then I want to hold the index number of that value in the dictionary.
7 - add 1 to the dic_t["counts"], which mean this value appears again and count how many.
8- if not exists, then add this value to the dic as new unique value (my_array_uniq":indx[0])
9 - also add 1 to the dic_t["counts"]

Solution

  • So from what I can see you are

    • Creating 256 random numbers from a linear distribution of numbers between -1000 and 1000
    • Generating 100 triplets from those (it could be fewer than 100 due to unique but with overwhelming probability it will be exactly 100)
    • Then doing pretty much the same thing 100 times and each time checking for each of the triplets in the new list whether they exist in the old list.
    • You're then trying to get a count of how often each element occurs.

    I'm wondering why you're trying to do this, because it doesn't make much sense to me, but I'll give a few pointers:

    • There's no reason to make a dictionary dic_t if you're only going to hold to objects in it, just use two variables my_array_uniq and counts
    • You're dealing with triplets of floating point numbers. In the given range, that should give you about 10^48 different possible triplets (I may be wrong on the exact number but it's an absurdly large number either way). The way you're generating them does reduce the total phase-space a fair bit, but nowhere near enough. The probability of finding identical ones is very very low.
    • If you have a set of objects (in this case number triplets) and you want to determine whether you have seen a given one before, you want to use sets. Sets can only contain immutable objects, so you want to turn your triplets into tuples. Determining whether a given triplet is already contained in your set is then an O(1) operation.
    • For counting the number of occurences of sth, collections.Counter is the natural datastructure to use.