I have been given a problem in Jupiter notebooks to code using python. This problem is about linear regression. It's as follows:
1: Linear Regression In this notebook we will generate data from a linear function: 𝐲=𝐗𝛽+𝜖 and then solve for 𝛽̂ using OLS (ordinary least squares) and gradient descent.
Question 1.1 : Generate data: 𝐲=𝐗𝛽+𝜖 Here we assume 𝑦≈𝑔(𝑋,𝛽)=𝐗𝛽+𝜖 where 𝑔 is linear in 𝛽 with additive noise 𝜖 Your function should have the following properties:
output y as an np.array with shape (M,1) generate_linear_y should work for any arbitrary x, b, and eps, as long as they are the appropriate dimensions do not use for-loops to calculate each y[i] separately, as this will be very slow for large M and N. Instead, you should leverage numpy linear algebra.
They expect us to write code as follows:
def generate_linear_y(X,b):
""" Write a function that generates m data points from inputs X and b
Parameters
----------
X : numpy.ndarray
x.shape must be (M,N)
Each row of `X` is a single data point of dimension N
Therefore `X` represents M data points
b : numpy.ndarray
b.shape must be (N,1)
Each element of `b` is a value of beta such that b=[[b1][b2]...[bN]]
Returns
-------
y : numpy.ndarray
y.shape = (M,1)
y[i] = X[i]b
"""
Can someone please assist me because I am thoroughly confused! I didn't even realize the things I am doing required array coding in python, which I always struggle with! Please help!
This looks like a direct matrix multiplication to me. In NumPy, this is implemented using the matrix multiplication operator @
(aka np.matmul
).
To generate random noise, you can use the functions from numpy.random
, most likely random_sample
or standard_normal
. If you want to do it the most-correct way, you can create a random number generator with default_rng
, then use, for instance, rng.standard_normal
.