I'm trying to display a plane of best fit within a 3D scatter plot using the library plot3D
. When the code below is run everything seems fine enough, but if I replace the fit
with the second fit
I get strange behaviour, the plane is no longer a flat plane. I expect both versions to produce the same picture. What's going on?
library(plot3D)
df <- structure(list(X = 1:10, TV = c(230.1, 44.5, 17.2, 151.5, 180.8,
8.7, 57.5, 120.2, 8.6, 199.8), radio = c(37.8, 39.3, 45.9, 41.3,
10.8, 48.9, 32.8, 19.6, 2.1, 2.6), newspaper = c(69.2, 45.1,
69.3, 58.5, 58.4, 75, 23.5, 11.6, 1, 21.2), sales = c(22.1, 10.4,
9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6)), .Names = c("X",
"TV", "radio", "newspaper", "sales"), row.names = c(NA, 10L), class = "data.frame")
x<-df$TV
y<-df$radio
z<-df$sales
fit <- lm(z ~ x + y)
# fit <- lm(df$sales ~ df$TV + df$radio)
x.pred <- seq(min(x), max(x), length.out = 5)
y.pred <- seq(min(y), max(y), length.out = 5)
xy <- expand.grid( x = x.pred, y = y.pred)
z.pred <- matrix(predict(fit, newdata = xy), nrow = 5, ncol = 5)
scatter3D(x, y, z,
surf = list(x = x.pred, y = y.pred, z = z.pred)
)
The short answer is: Both fits are correct. However the second predict
is not finding the right column names to predict.
If you want the second fit to work use:
fit <- lm(sales ~ TV + radio, data=df)
...
xy <- expand.grid(TV = x.pred, radio = y.pred)
Why? Because predict
always searches for the column name it was trained in newdata
.
You may note that the first line in the code above also changed, we are no longer using the df$var
format, instead we use the data
argument. This happens because when using this format fit$model
is equals to:
df$sales df$TV df$radio
1 22.1 230.1 37.8
2 10.4 44.5 39.3
3 9.3 17.2 45.9
...
And we can't name column names with "$" dollar sign. In other words we can't do:
fit <- lm(df$sales ~ df$TV + df$radio)
...
xy <- expand.grid(df$TV = x.pred, df$radio = y.pred)
Because it will throw an error.
As stated above, both fits are, indeed, correct. If you run,
fit <- lm(z ~ x + y)
fit
you will get,
Coefficients: (Intercept) x y
2.08052 0.05598 0.15282
and with,
fit <- lm(df$sales ~ df$TV + df$radio)
fit
you will get,
Coefficients: (Intercept) x y
2.08052 0.05598 0.15282
as well.
Finally, note that when predict
with newdata
can't find the right variables names, you will get a warning message like this:
'newdata' had 25 rows but variables found have 10 rows
Which in my opinion should be an error. But it may get fixed in next versions. Some other sources about this issue are: