I want to randomly draw N = 30 slope and intercept pairs, with replacement, and do it F = 5,000 times. For each draw I want to calculate the slope and intercept of the regression line and then plot the histogram of slope and intercept. Here is the code I have so far.
F = 10000
N = 30
X = sigma*(np.random.randn(F)/F)
Y = beta*X + alpha + sigma*(np.random.randn(F))
Xbar = np.mean(X)
Ybar = np.mean(Y)
numer2 = 0
denom2 = 0
for i in range(F):
for j in range(N):
numer2 += (X[j]-Xbar)*(Y[j]-Ybar)
denom2 += (X[j]-Xbar)**2
slope = numer2/denom2
intercept = Ybar - slope*Xbar
plt.figure(1)
plt.hist(slope, bins=50)
plt.hist(intercept, bins=50)
plt.grid()
plt.show()
I want to get 30 slope and intercept pairs, 5,000 times. I thought the double for loop would do that. Unfortunately, all I can get is one value for each. How can I fix this?
There's two errors, firstly what @GreenCloakGuy pointed out, you are not storing the values for the slope and intercept. Second, you are not sampling randomly from your X and Y with the second iteration. Also you don't need a loop to make your calculations, numpy array calculations are vectorized:
F = 5000
N = 30
sigma = 0.5
beta = 2
alpha = 0.2
X = np.random.randn(F)
Y = beta*X + alpha + sigma*(np.random.randn(F))
Xbar = np.mean(X)
Ybar = np.mean(Y)
slopes = []
intercepts = []
for i in range(F):
j = np.random.randint(0,F,N)
numer2 = np.sum((X[j]-Xbar)*(Y[j]-Ybar))
denom2 = np.sum((X[j]-Xbar)**2)
slope = numer2/denom2
intercept = Ybar - slope*Xbar
slopes.append(slope)
intercepts.append(intercept)
Not very sure what you are trying to do with your code and also where the sigma values are going. I think the above should give you a distribution of slopes and intercepts.