Search code examples
rscatter3dplot3d

3D scatterplot done with plot3D producing strange behaviour


I'm trying to display a plane of best fit within a 3D scatter plot using the library plot3D. When the code below is run everything seems fine enough, but if I replace the fit with the second fit I get strange behaviour, the plane is no longer a flat plane. I expect both versions to produce the same picture. What's going on?

enter image description here

library(plot3D)

df <- structure(list(X = 1:10, TV = c(230.1, 44.5, 17.2, 151.5, 180.8, 
8.7, 57.5, 120.2, 8.6, 199.8), radio = c(37.8, 39.3, 45.9, 41.3, 
10.8, 48.9, 32.8, 19.6, 2.1, 2.6), newspaper = c(69.2, 45.1, 
69.3, 58.5, 58.4, 75, 23.5, 11.6, 1, 21.2), sales = c(22.1, 10.4, 
9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6)), .Names = c("X", 
"TV", "radio", "newspaper", "sales"), row.names = c(NA, 10L), class = "data.frame")


x<-df$TV
y<-df$radio
z<-df$sales

fit <- lm(z ~ x + y)
# fit <- lm(df$sales ~ df$TV + df$radio)

x.pred <- seq(min(x), max(x), length.out = 5)
y.pred <- seq(min(y), max(y), length.out = 5)
xy <- expand.grid( x = x.pred, y = y.pred)

z.pred <- matrix(predict(fit, newdata = xy), nrow = 5, ncol = 5)

scatter3D(x, y, z,
    surf = list(x = x.pred, y = y.pred, z = z.pred)
    )

Solution

  • The short answer is: Both fits are correct. However the second predict is not finding the right column names to predict.


    If you want the second fit to work use:

    fit <- lm(sales ~ TV + radio, data=df)
    ...
    xy <- expand.grid(TV = x.pred, radio = y.pred)
    

    Why? Because predict always searches for the column name it was trained in newdata.

    You may note that the first line in the code above also changed, we are no longer using the df$var format, instead we use the data argument. This happens because when using this format fit$model is equals to:

       df$sales df$TV df$radio
    1      22.1 230.1     37.8
    2      10.4  44.5     39.3
    3       9.3  17.2     45.9
    ...
    

    And we can't name column names with "$" dollar sign. In other words we can't do:

    fit <- lm(df$sales ~ df$TV + df$radio)
    ...
    xy <- expand.grid(df$TV = x.pred, df$radio = y.pred)
    

    Because it will throw an error.


    As stated above, both fits are, indeed, correct. If you run,

    fit <- lm(z ~ x + y)
    fit
    

    you will get,

    Coefficients: (Intercept) x y
    2.08052 0.05598 0.15282

    and with,

    fit <- lm(df$sales ~ df$TV + df$radio)
    fit
    

    you will get,

    Coefficients: (Intercept) x y
    2.08052 0.05598 0.15282

    as well.


    Finally, note that when predict with newdata can't find the right variables names, you will get a warning message like this:

    'newdata' had 25 rows but variables found have 10 rows
    

    Which in my opinion should be an error. But it may get fixed in next versions. Some other sources about this issue are: