Confused by random.randn()

I am a bit confused by the numpy function random.randn() which returns random values from the standard normal distribution in an array in the size of your choosing.

My question is that I have no idea when this would ever be useful in applied practices.

For reference about me I am a complete programming noob but studied math (mostly stats related courses) as an undergraduate.

Solution

The Python function randn is incredibly useful for adding in a random noise element into a dataset that you create for initial testing of a machine learning model. Say for example that you want to create a million point dataset that is roughly linear for testing a regression algorithm. You create a million data points using

x_data = np.linspace(0.0,10.0,1000000)

You generate a million random noise values using randn

noise = np.random.randn(len(x_data))

To create your linear data set you follow the formula y = mx + b + noise_levels with the following code (setting b = 5, m = 0.5 in this example)

y_data = (0.5 * x_data ) + 5 + noise

Finally the dataset is created with

my_data = pd.concat([pd.DataFrame(data=x_data,columns=['X Data']),pd.DataFrame(data=y_data,columns=['Y'])],axis=1)