I am trying to write a blog on Linear Regression but am stuck on creating a random dataset that is linearly related.
This code below helps me get some kind of linearly related random data, but how can I make the spread wider?
x = np.random.normal(3, 1, 100)
y = 0.77 * (x + np.random.normal(0, 0.1, 100)) + 0.66
plt.figure(figsize = (20,5))
plt.scatter(x, y)
plt.show()
If you want some test data, by definition you don't want it to be random. But you want it to be noisy. A good way to achieve that is to select perfect points on your line, and then move them slightly to make them noisy.
x = np.linspace(0, 6)
y = np.linspace(0, 3)
noise_factor = 0.2
def noise(k):
return k+((random.random()*2)-1)*noise_factor
x = np.vectorize(noise)(x)
y = np.vectorize(noise)(y)