I have a tree growth database containing the number of cells in every growth stage (enlarging, thickening, mature) for every DOY (Day of the year) for every tree and every year (6 different trees per year, 10 years, 60 trees in total). The database looks like this (simplified):
Year Tree DOY Enlarging Thickening Mature
2012 25 80 0 0 0
2012 25 87 1 0 0
2012 25 94 4 0 0
2012 25 103 5 1 0
2012 25 111 3 3 0
2012 25 119 1 4 1
2012 25 127 1 5 3
2012 30 80 0 0 0
2012 30 87 2 0 0
2012 30 94 5 1 0
2012 30 103 7 3 1
2012 30 111 4 6 2
2012 30 119 3 7 5
2012 30 127 1 8 7
2012 43 80 0 0 0
2012 43 87 0 0 0
2012 43 94 2 0 0
etc.
I would like to apply a GAM function to obtain predictions about when does every growth stage start and finish for every tree, every year, and also understand the growth curves and pattern every stage follows. The model I use it's just a simple GAM between every growth phase number of cells (enlarging, thickening, mature) and the day of the year it occurs:
Enlarging <- gam(Enlarging ~ s(DOY), data=datosSTD, quasipoisson, gamma=1, min.sp=0.01)
Thickening <- gam(Thickening ~ s(DOY), data=datosSTD, quasipoisson, gamma=1, min.sp=0.01)
Mature <-gam(Mature ~ s(DOY), data=datosSTD, quasipoisson, gamma=1, min.sp=0.01)
My problem is that I am unable to apply this GAM model to every individual tree, or even apply it for the yearly average of trees. For example, I tried using dplyr with the enlarging phase:
Enlarging <- df %>%
group_by(Tree, Year)%>%
do(gam_enlarging = gam(Enlarging ~ s(DOY), data = ., quasipoisson, gamma = 1, min.sp = 0.01))%>%
ungroup
which gives me a list with the coefficients, residuals and fitted values among other data. But when I try to obtain the fitted values of the model for every tree I get an error:
fitted.enlarging <- data.frame(c(sapply(Enlarging$gam_enlarging, fitted)))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 19, 18, 23, 15, 26, 22, 29, 21
I tried using the function predict, but I get a different error:
predicted.enlarging <- as.vector(predict(Enlarging, data.frame(DOY),type="response"))
Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
I was thinking about using a loop, but honestly I don't think I have the knowledge to do that yet. I'm still learning basic R. I just need to know when does every growth phase start (nº Enlarging cells>1) and ends (nº Enlarging cells<0) for every tree, every year, and be able to plot a representation of the growth stage curve pattern. If possible, I would like to use the dplyr package to group my data and process it since it's the package I'm more familiar with.
A set of dataframe rearrangements with {dplyr} and {tidyr} allow you to group your data, nest them (i. e. populate a dataframe 'cell' with subsets of data), apply arbitrary functions (like gam
), and finally unnest the thing again.
It's probably best explained with an example. Using your data (in that case only one combination of Tree and Year, but use it on your whole data as desired):
your example data:
df <- structure(list(Year = c(2012L, 2012L, 2012L, 2012L, 2012L, 2012L,
2012L), Tree = c(25L, 25L, 25L, 25L, 25L, 25L, 25L), DOY = c(80L,
87L, 94L, 103L, 111L, 119L, 127L), Enlarging = c(0L, 1L, 4L,
5L, 3L, 1L, 1L), Thickening = c(0L, 0L, 0L, 1L, 3L, 4L, 5L),
Mature = c(0L, 0L, 0L, 0L, 0L, 1L, 3L)), class = "data.frame", row.names = c(NA,
7L))
required libraries:
library(dplyr)
library(tidyr)
library(gam)
Count
by growth status groups:df_long <-
df |>
pivot_longer(Enlarging:Mature,
names_to = 'Growth_Phase',
values_to = 'Count'
)
#> df_long |> head(3)
# A tibble: 3 x 5
Year Tree DOY Growth_Phase Count
<int> <int> <int> <chr> <int>
1 2012 25 80 Enlarging 0
2 2012 25 80 Thickening 0
3 2012 25 80 Mature 0
Year
, Tree
and Growth_Phase
:df_nested <-
df_long |>
nest_by(Year, Tree, Growth_Phase)
column data
of the nested dataframe contains one dataframe per row (= unique combination of Year
x Tree
x Growth_Phase
):
# A tibble: 3 x 4
# Rowwise: Year, Tree, Growth_Phase
Year Tree Growth_Phase data
<int> <int> <chr> <list<tibble[,2]>>
1 2012 25 Enlarging [7 x 2]
2 2012 25 Mature [7 x 2]
3 2012 25 Thickening [7 x 2]
list
, so it can be accomodated by a single cell of the main dataframe):df_with_gam <-
df_nested |>
rowwise() |>
mutate(the_gam = list(
gam(Count ~ s(DOY),
data = data, quasipoisson, gamma = 1, min.sp = 0.01)
),
Fitted = list(predict(the_gam, type = 'response'))
)
> df_with_gam
# A tibble: 3 x 6
# Rowwise:
Year Tree Growth_Phase data the_gam Fitted
<int> <int> <chr> <list<tibble[,2]>> <list> <list>
1 2012 25 Enlarging [7 x 2] <Gam> <dbl [7]>
2 2012 25 Mature [7 x 2] <Gam> <dbl [7]>
3 2012 25 Thickening [7 x 2] <Gam> <dbl [7]>
df_with_gam |>
select(Year:Growth_Phase, data, Fitted) |>
unnest_longer(c('data', 'Fitted')) |>
unnest_wider(data)
+ # A tibble: 21 x 7
Year Tree Growth_Phase DOY Count Fitted Fitted_id
<int> <int> <chr> <int> <int> <dbl> <chr>
1 2012 25 Enlarging 80 0 1.30e- 1 1
2 2012 25 Enlarging 87 1 9.71e- 1 2
3 2012 25 Enlarging 94 4 3.79e+ 0 3
4 2012 25 Enlarging 103 5 5.07e+ 0 4
5 2012 25 Enlarging 111 3 2.87e+ 0 5
6 2012 25 Enlarging 119 1 1.26e+ 0 6
7 2012 25 Enlarging 127 1 8.95e- 1 7
8 2012 25 Mature 80 0 5.94e-11 1
9 2012 25 Mature 87 0 3.23e-13 2
10 2012 25 Mature 94 0 4.53e-15 3
# ... with 11 more rows
filter
, pivot_wider
, arrange
(= sort) etc. as desired ...
quick control, df_result
being your resulting dataframe after above manipulations:
library(ggplot2)
df_result |>
ggplot() +
geom_point(aes(Count, Fitted)) +
facet_wrap(~ Growth_Phase)