I was learning Machine Learning from this course on Coursera taught by Andrew Ng. The instructor defines the hypothesis as a linear function of the "input" (x, in my case) like the following:
hθ(x) = θ0 + θ1(x)
In supervised learning, we have some training data and based on that we try to "deduce" a function which closely maps the inputs to the corresponding outputs. To deduce the function, we introduce the hypothesis as a linear function of input (x). My question is, why the function involving two θs is chosen? Why it can't be as simple as y
(i
) = a * x
(i
) where a
is a co-efficient? Later we can go about finding a "good" value of a
for a given example (i)
using an algorithm? This question might look very stupid. I apologize but I'm not very good at machine learning I am just a beginner. Please help me understand this.
Thanks!
The a
corresponds to θ1. Your proposed linear model is leaving out the intercept, which is θ0.
Consider an output function y
equal to the constant 5
, or perhaps equal to a constant plus some tiny fraction of x
which never exceeds .01
. Driving the error function to zero is going to be difficult if your model doesn't have a θ0 that can soak up the D.C. component.