python performance frequency-distribution

python for loop performance

`Improve performance in python 'for' loop: I need to know How can i decide whether my loop is efficient or not? If it iterates over X number of loops, what should be the acceptable time taken?

I was trying to make a function to create a frequency distribution table with python. I have a continuous data in form of numpy array, i want to make class intervals and put each elements in these class intervals(I use 'for loop' to do it). I have created the function but i'm not convinced if my function is efficient or not.

def maketable(data,bins):
    data=np.array(data)
    edges=np.linspace(min(data),max(data),bins)  #creating classintervals
    classes={(edges[x],edges[x+1]):0 for x in range(bins-1)} #{tuple of classlimits:frequency}
    #for every value in data array we check if it falls in an interval(a bin) if yes,increment frequency 
    for val in data:
       for interval in classes.keys():
           if val>=interval[0] and val<=interval[1]:
              classes[interval]+=1
              break
    return(classes)

"Finished 'maketable' in 0.17328 secs ". The data contains 20,604 values and the function takes 0.17 secs to complete. I want to know if its ok or not. i appreciate any kinds help.

Solution

So it looks like what you are actually trying to obtain is the histogram of some data. Your function could then be implemented using numpy with:

classes, bins = np.histogram(data, bins=bins)

Then you can return your classes name.