TL;DR [
x I'm purring a scatter plot for each species in a tibble with two types of regressions superimposed. Shows height predicted by diameter for trees.
x nls
finds multiple possible data
and fails to compute geom_smooth
- a tidyeval error?
x I'm not sure how to use a user defined function with 'map2'.
]
A sample of my dataframe, train.data
, is attached as a dput
output at the end of the message.
I've split my data into a test set (20%) and a training set (80%). I've calculated summaries for the linear and non-linear models earlier and made a plot with the predicted values versus the estimated values. But I'd like a graph with the curve for the estimated models (linear and non-linear), and if I've understood it correctly, ggplot2
should come to the same conclusion as nls
and lm
? A tidyverse way to include offset (same for all observations) in the tibble instead of in the data.frame would be very welcome.
First, creating plotting function to map. NLS is red, LM is blue.
double_mapper <- function(x, colname) {
ggplot(data = x, aes(x=dia, y=Height)) +
geom_point(shape=1) +
ggtitle(label = colname)+
theme_bw() +
theme(legend.title=element_blank(), axis.title = element_blank())+
geom_smooth(method="nls",
formula= Height ~ -1 +I(dia^2)/I((a+b*dia)^2),
method.args = list(offset=offset,
start = list(a=10, b=0.2), #Earlier study solution
se=F),
color="red")+
geom_smooth(method="lm",
formula= Height ~ -1 + dia,
method.args= list(offset=offset),
color="blue"
)
}
Create a tibble with nested species and create a graph for each.
mixed_df_test <- train.data %>%
group_by(SPP) %>%
nest() %>%
mutate(graphs=map2(.x = data,.y = SPP, partial(double_mapper,
x= .x,
colname=.y)))
plots_model_mixed <- ggpubr::ggarrange(plotlist = mixed_df_test$graphs, common.legend=TRUE,legend = "top",ncol = 2,nrow = 4)
Error messages:
from map2
Error in (function (x, colname) : unused arguments (.x[[1]], .y[[1]])
from nls
Warning messages:
1: Computation failed in `stat_smooth()`:
parameters without starting value in 'data': Height, dia
dput of train.data:
structure(list(SPP = c("Abies sibirica", "Abies sibirica", "Abies sibirica",
"Abies sibirica", "Abies sibirica", "Pinus sylvestris", "Pinus sylvestris",
"Pinus sylvestris", "Pinus sylvestris", "Pinus sylvestris"),
Height = c(6, 7.6, 9.9, 6.2, 8.1, 8.3, 7.7, 8.2, 7.8, 9.6
), dia = c(74.4580418759451, 96.2808392152873, 115.995689575087,
84.4985206971104, 104.498803820905, 141.492049246592, 151.459565561241,
177.997190989072, 190.499343830891, 152), offset = c(1.3,
1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3)), row.names = c(NA,
-10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
SPP = c("Abies sibirica", "Pinus sylvestris"), .rows = list(
1:5, 6:10)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
There were a number of problems in the original code.
1) You needed a ~
before partial
. Actually, you didn't need partial in this example.
2) In the formula in geom_smooth
you have to use x
and y
rather than the names of the original variables.
3) You need to tell ggplot where to find the offset
4) se
is an argument to geom_smooth
not one of the method.args
double_mapper <- function(x, colname) {
ggplot(data = x, aes(x=dia, y=Height)) +
geom_point(shape=1) +
ggtitle(label = colname)+
theme_bw() +
theme(legend.title=element_blank(), axis.title = element_blank())+
geom_smooth(method="nls",
formula = y ~ -1 +I(x^2)/I((a+b*x)^2),
method.args = list(offset=x$offset,
start = list(a=10, b=0.2)), #Earlier study solution
se = FALSE,
color="red") +
geom_smooth(method="lm",
formula= y ~ -1 + x,
method.args= list(offset=x$offset),
color="blue"
)
}
mixed_df_test <- train.data %>%
group_by(SPP) %>%
nest() %>%
mutate(graphs=map2(.x = data,.y = SPP, ~double_mapper(
x= .x,
colname=.y)))
plots_model_mixed <- ggpubr::ggarrange(plotlist = mixed_df_test$graphs, common.legend=TRUE,legend = "top",ncol = 2,nrow = 4)
plots_model_mixed
I am fairly certain that you could use facets rather than multiple plots - this would make much simpler code. I am not sure how to specify the offset though (It might be better to fit the models outside of the plot and provide the fitted values in a data.frame.
If facets don't work, have a look at the patchwork
package for an easy way to combine plots.