I have taken a 3D
surface from package akima
.
library(akima)
data(akima)
matrix_1 <- akima::interp(x = akima$x, y = akima$y, z = akima$z)
persp(matrix_1$x, matrix_1$y, matrix_1$z)
To this surface, I have added some outlier data points.
# Add outliers
akima$x = c(akima$x, 11, 12, 13)
akima$y = c(akima$y, 11, 12, 13)
akima$z = c(akima$z, 70, 80, 75)
matrix_2 <- akima::interp(x = akima$x, y = akima$y, z = akima$z)
persp(x = matrix_2$x, y = matrix_2$y, z = matrix_2$z)
Now, I want to smooth these outliers to the original surface without disturbing the original surface but I could not accomplish that. Here is what I have tried and I will appreciate it if someone can show me how to do this.
# Trying to smooth and remove outliers
library(mgcv)
mod <- gam(z ~ te(x) + te(y) + ti(x, y), data = akima)
akima$new = predict(mod, akima)
matrix_3 <- akima::interp(x = akima$x, y = akima$y, z = akima$new)
persp(matrix_3$x, matrix_3$y, matrix_3$z)
Here is the surface after smoothing with gam
, which is not the expected output. The expected output is the first graph with the new values of outliers.
This is more of a statistical question than a computing question, but one approach would be to look at the size of the residuals from your gam()
fit, and only modify the z
coordinate if the residual is particularly large.
For example:
library(akima)
data(akima)
akima$x <- c(akima$x, 11, 12, 13)
akima$y <- c(akima$y, 11, 12, 13)
akima$z <- c(akima$z, 70, 80, 75)
# Trying to smooth and remove outliers
library(mgcv)
#> Warning: package 'mgcv' was built under R version 4.1.2
#> Loading required package: nlme
#> Warning: package 'nlme' was built under R version 4.1.2
#> This is mgcv 1.8-39. For overview type 'help("mgcv-package")'.
mod <- gam(z ~ te(x) + te(y) + ti(x, y), data = akima)
akima$pred <- predict(mod, akima)
akima$resid <- akima$z - akima$pred
hist(akima$resid)
akima$new <- ifelse(akima$resid < 25, akima$z, akima$pred)
matrix_3 <- akima::interp(x = akima$x, y = akima$y, z = akima$new)
persp(matrix_3$x, matrix_3$y, matrix_3$z)
Created on 2022-03-29 by the reprex package (v2.0.1)
I just chose the cutoff of 25 by eye; if you want to use a more automatic method, you'll need to ask about that on stats.stackexchange.com . You could also iterate a bit, to get better predictions without including the outliers. Again, that's a stats question.