Search code examples
pythonfor-loopregressionintercept

Python random draws 5,000 times


I want to randomly draw N = 30 slope and intercept pairs, with replacement, and do it F = 5,000 times. For each draw I want to calculate the slope and intercept of the regression line and then plot the histogram of slope and intercept. Here is the code I have so far.

F = 10000
N = 30
X = sigma*(np.random.randn(F)/F)
Y = beta*X + alpha + sigma*(np.random.randn(F))
Xbar = np.mean(X)
Ybar = np.mean(Y)
numer2 = 0
denom2 = 0
for i in range(F):
    for j in range(N):
        numer2 += (X[j]-Xbar)*(Y[j]-Ybar)
        denom2 += (X[j]-Xbar)**2
        slope = numer2/denom2
        intercept = Ybar - slope*Xbar

plt.figure(1)
plt.hist(slope, bins=50)
plt.hist(intercept, bins=50)
plt.grid()
plt.show()

I want to get 30 slope and intercept pairs, 5,000 times. I thought the double for loop would do that. Unfortunately, all I can get is one value for each. How can I fix this?


Solution

  • There's two errors, firstly what @GreenCloakGuy pointed out, you are not storing the values for the slope and intercept. Second, you are not sampling randomly from your X and Y with the second iteration. Also you don't need a loop to make your calculations, numpy array calculations are vectorized:

    F = 5000
    N = 30
    
    sigma = 0.5
    beta = 2
    alpha = 0.2
    
    X = np.random.randn(F)
    Y = beta*X + alpha + sigma*(np.random.randn(F))
    Xbar = np.mean(X)
    Ybar = np.mean(Y)
    
    slopes = []
    intercepts = []
    for i in range(F):
        j = np.random.randint(0,F,N)
        numer2 = np.sum((X[j]-Xbar)*(Y[j]-Ybar))
        denom2 = np.sum((X[j]-Xbar)**2)
        slope = numer2/denom2
        intercept = Ybar - slope*Xbar
        slopes.append(slope)
        intercepts.append(intercept)
    

    Not very sure what you are trying to do with your code and also where the sigma values are going. I think the above should give you a distribution of slopes and intercepts.