Search code examples
pythonoptimizationmicro-optimization

Search of interval optimization


My goal is to find the probability density function for a certain distribution, using a given algorithm.

This algorithm requires that I search in which interval a float is placed in. Even though the code runs perfectly, it takes too long. I was looking for a way of optimizing my code, but none came to mind.

In each iteration I check if the float is in the interval: if that's the case, I'd like to had a unity to the position I'm considering, in array p.

This is my code:

import numpy as np
import pylab as plt
import random as rd

n = [10,100,1000]
N = [10**6]
dy = 0.005
k_max = int(1/dy-1)
y = np.array([(j+0.5)*dy for j in range(k_max+1)])
intervals = np.linspace(0,1,k_max+2)

def p(y,n,N):

   p = np.zeros(len(y))

   Y = np.array([sum(np.array([rd.random() for k in range(n)]))/n for j in range(N)])
   z = np.array([sum(np.array([rd.random() for k in range(n)])) for l in range(N)])

   for j in Y:
       for i in range(len(y)-1):
           if intervals[i] <= j < intervals[i+1]:
               p[i] += 1

   return(p/(dy*N))

for a in n:
    pi = p(y,a,N[0])

    plt.plot(y,pi,label = 'n = ' + str(a))

plt.title('Probability Density Function')
plt.xlabel('x')
plt.ylabel('p(x)')
plt.show()

Edit: I've added the full code, as requested. Edit 2: Fixed an error intervals.


Solution

  • A quick and simple optimization can be made here:

       for j in Y:
           for i in range(len(y)-1):
               if intervals[i] <= j < intervals[i+1]:
                   p[i] += 1
    

    Since intervals consists of len(y) evenly spaced numbers over the interval [0, 1], which is also the range of Y values, we need not search the position of j in intervals, but rather we can compute it.

        for j in Y: p[int(j*(len(y)-1))] += 1
    

    Also we can remove the unused

       z = np.array([sum(np.array([rd.random() for k in range(n)])) for l in range(N)])
    

    The greatest part of the remaining execution time is taken by

       Y = np.array([sum(np.array([rd.random() for k in range(n)]))/n for j in range(N)])
    

    Here the inner conversions to np.array are very time consuming; better leave them all out:

       Y = [sum([rd.random() for k in range(n)])/n for j in range(N)]