Search code examples
pythonperformancefrequency-distribution

python for loop performance


`Improve performance in python 'for' loop: I need to know How can i decide whether my loop is efficient or not? If it iterates over X number of loops, what should be the acceptable time taken?

I was trying to make a function to create a frequency distribution table with python. I have a continuous data in form of numpy array, i want to make class intervals and put each elements in these class intervals(I use 'for loop' to do it). I have created the function but i'm not convinced if my function is efficient or not.

def maketable(data,bins):
    data=np.array(data)
    edges=np.linspace(min(data),max(data),bins)  #creating classintervals
    classes={(edges[x],edges[x+1]):0 for x in range(bins-1)} #{tuple of classlimits:frequency}
    #for every value in data array we check if it falls in an interval(a bin) if yes,increment frequency 
    for val in data:
       for interval in classes.keys():
           if val>=interval[0] and val<=interval[1]:
              classes[interval]+=1
              break
    return(classes)

"Finished 'maketable' in 0.17328 secs ". The data contains 20,604 values and the function takes 0.17 secs to complete. I want to know if its ok or not. i appreciate any kinds help.


Solution

  • So it looks like what you are actually trying to obtain is the histogram of some data. Your function could then be implemented using numpy with:

    classes, bins = np.histogram(data, bins=bins)
    

    Then you can return your classes name.