I am currently working on a linear regression project where I need to gather data, fit it on a model, and then make a prediction based on test data.
If I'm correct, simple linear regression works with two variables, X (independent) and Y (dependent). I have the following Dataset, where I consider the time
column to be X and the value
column to be Y:
+-----+------+
|value|minute|
+-----+------+
| 5000| 672|
| 6000| 673|
| 7000| 676|
| 8000| 678|
| 9000| 680|
+-----+------+
What I don't know is how to fit this Dataset correctly into a Linear Regression Model. I've worked with k-means before and what I did with it was create a features
column in vector form. I did the same with this dataset:
VectorAssembler assembler = new VectorAssembler()
.setInputCols(new String[]{"minute", "value"})
.setOutputCol("features");
Dataset<Row> vectorData = assembler.transform(dataset);
I then fit this into a linear regression model:
LinearRegression lr = new LinearRegression();
LinearRegressionModel model = lr.fit(vectorData);
This is the part where I get stuck. How can I make predictions with this model? I want to find the value of value
when minute
is equal to a random minute, eg. 700.
How can I do that? How can I find a prediction/estimate of my Y value based on a random X value?
EDIT: Does the linear regression model differentiates between dependent and independent variable? How?
So thanks to the feedback of @RickMoritz and @JacekLaskowski I was able to figure out the solution:
LinearRegression does indeed have X and Y columns. The X column is the features
column and the Y column is the label
column.
So before fitting your dataset into a LinearRegression model, make sure to state your label
and features
columns. You can set your label column when you define your LinearRegression:
LinearRegression lr = new LinearRegression().setLabelCol(Ycolumn_name);
For the features column, make sure you convert your X column into vector type, and then you can do the same:
LinearRegression lr = new LinearRegression().setFeaturesCol(Xcolumn_name);
Once you've done that you're all set. To get a prediction just convert your X value into a vector and put it on the predict()
function of the LinearRegressionModel.