Search code examples
pythonrandomlinear-regression

How can I create linearly related random data in Python?


I am trying to write a blog on Linear Regression but am stuck on creating a random dataset that is linearly related.

This code below helps me get some kind of linearly related random data, but how can I make the spread wider?

x = np.random.normal(3, 1, 100)
y = 0.77 * (x + np.random.normal(0, 0.1, 100)) + 0.66

plt.figure(figsize = (20,5))
plt.scatter(x, y)
plt.show()

enter image description here


Solution

  • If you want some test data, by definition you don't want it to be random. But you want it to be noisy. A good way to achieve that is to select perfect points on your line, and then move them slightly to make them noisy.

    x = np.linspace(0, 6)
    y = np.linspace(0, 3)
    
    noise_factor = 0.2
    
    def noise(k):
       return k+((random.random()*2)-1)*noise_factor
    
    x = np.vectorize(noise)(x)
    y = np.vectorize(noise)(y)