I have an issue with some data, and I simply cannot understand why.
I'm trying to estimate var4
from var3
using a GAM.
Here is the dataset I'm using to obtain my model:
for_model <- read.csv("https://raw.githubusercontent.com/fredlm/mockup/master/for_model.csv")
And the dataset in which I want to estimate var4
:
for_est <- read.csv("https://raw.githubusercontent.com/fredlm/mockup/master/for_est.csv")
What I've done, simply:
for_est <- for_est %>%
mutate(var4 = ifelse(!var3 == 0, predict.gam(gam(var4 ~ s(log(var3)), data = for_model), newdata = .), NA))
It returns the following error:
Error: Problem with mutate()
column var4
.
var4 = predict.gam(gam(var4 ~ s(log(var3)), data = for_model), newdata = .)
.
x NA/NaN/Inf in foreign function call (arg 1)
Despite a thorough research on the web and a few hours spent on my data, I can't find how to fix this...
However, when I plot the GAM, things work great:
ggplot(data = for_model,
aes(x = var3,
y = var4)) +
geom_point() +
geom_smooth(method = "gam",
formula = y~s(log(x)))
Any idea how to fix this? I've looked for NaN or Inf values but there are none. Also, when I'm trying to estimate var4
from var2
— which is VERY similar to var3
- things work well...
for_est <- for_est %>%
mutate(var4 = ifelse(!var2 == 0, predict.gam(gam(var4 ~ s(log(var2)), data = for_model), newdata = .), NA))
Thanks a lot!
ps: my apologies for the rather large files, but given that I don't understand the problem, I thought it might make more sense to provide all of them... :)
When you use ifelse to keep away from var3 == 0, you need to restrict the for_est
input data the same way.
(I split up the model solving from the predicting, just to make testing faster, that doesn't matter)
gamfit <- gam(var4 ~ s(log(var3)), data = for_model)
for_est <- for_est %>%
mutate(var4 = ifelse(var3 != 0, predict(gamfit, newdata = .[var3 != 0, ]), NA_real_))