Search code examples
rtrend

Identify simple trend in a data sequence in R


I have data organized by the year it was collected. I would like to know how do I know if there was a trend of increase, decrease or stabilization of the data during the period.

I can do this manually, plotting the plots with ggplot and then visually checking for trends. But this would be unfeasible because I have many columns with data. I would like to do something automatic.

for example visually checking, I see a slight upward trend for the var1 variable:

library(ggplot2)
library(tidyverse)
df<-data.frame(year=c(2000,2001,2001,2002,2000,2002,2000,2001,2002,2001),
           var1=c(1,2,3,4,5,6,7,8,9,10),
           var2=c(2,3,6,4,8,12,13,4,21,3),
           var3=c(0.3,8,6,5,3,2,1,0.6,0.8,0.5),
           var4=sample(-5:5, size = 10))
df

ggplot(df, aes(x=year, y=var1))+
  geom_point(aes(color = "Mean"), size=2.5)+
  stat_smooth(aes(color = "Trend"), se=FALSE)

enter image description here

Would there be a possibility for R to do this check automatically? and create a new column indicating the variables increased, decreased or stabilized in variables var1, var2, var3, var4?


Solution

  • This shows for each variable the estimate of the slope and the p-value. Smaller p-values are more significant. Here only var4 has a slope significantly different from zero at the 5% level but you could adjust what cutoff to use which does not change the code.

    library(dplyr)
    library(broom)
    
    lm(as.matrix(df[-1]) ~ year, df) %>%
      tidy %>%
      filter(term == "year")
    ## # A tibble: 4 × 6
    ##   response term  estimate std.error statistic p.value
    ##   <chr>    <chr>    <dbl>     <dbl>     <dbl>   <dbl>
    ## 1 var1     year     1.00       1.26     0.792  0.451 
    ## 2 var2     year     2.33       2.49     0.937  0.376 
    ## 3 var3     year     0.583      1.16     0.504  0.628 
    ## 4 var4     year     2.50       1.07     2.34   0.0476
    
    
    library(lattice)
    library(tidyr)
    
    df |>
      pivot_longer(-year) |>
      xyplot(value ~ year | name, data = _, type = c("p", "r"), as.table = TRUE)
    

    screenshot