Search code examples
rloopstime-seriestrend

In R, how do I create individual time series (to loop trough and run a Mann-Kendall test) from a larger dataset?


I am starting to automate our Mann-Kendall tests used for time series data of water quality data. It was done in Excel before, but was too much copying and pasting. I want to create an R script to run the MK test (package ‘trend’) and calculate the Sens Slope. I have the following example table with our water quality data:

| Site | Program | Year | parameter1 | parameter2 |
|------|---------|------|-----|-----|
| A    | ABC     | 1990 | 5   | 100 |
| A    | ABC     | 1991 | 10  | 75  |
| A    | ABC     | 1992 | 15  | 50  |
| A    | ABC     | 1993 | 20  | 25  |
| A    | ABC     | 1994 | 25  | 5   |
| B    | ABC     | 1990 | 10  | 88  |
| B    | ABC     | 1991 | 20  | 44  |
| B    | ABC     | 1992 | 30  | 22  |
| B    | ABC     | 1993 | 40  | 11  |
| B    | ABC     | 1994 | 50  | 6   |
| C    | XYZ     | 1990 | 6   | 64  |
| C    | XYZ     | 1991 | 12  | 44  |
| C    | XYZ     | 1992 | 18  | 24  |
| C    | XYZ     | 1993 | 24  | 14  |
| C    | XYZ     | 1994 | 30  | 4   |
| D    | XYZ     | 1990 | 7   | 99  |
| D    | XYZ     | 1991 | 14  | 88  |
| D    | XYZ     | 1992 | 21  | 77  |
| D    | XYZ     | 1993 | 28  | 66  |
| D    | XYZ     | 1994 | 35  | 55  |

I need to take out each time series in the data (so for sites A,B,C,D) for each parameter (ANC and SO4) and run the MannKendall test in R (code below). I need an output table to look like the following, but with the MK statistic and sens slope filled in (not 1 like it shows below).

| Site | Program | Parameter | MK Statistic | Sens Slope |
|------|---------|-----------|--------------|------------|
| A    | ABC     | ANC       | 1            | 1          |
| A    | ABC     | SO4       | 1            | 1          |
| B    | ABC     | ANC       | 1            | 1          |
| B    | ABC     | SO4       | 1            | 1          |
| C    | XYZ     | ANC       | 1            | 1          |
| C    | XYZ     | SO4       | 1            | 1          |
| D    | XYZ     | ANC       | 1            | 1          |
| D    | XYZ     | SO4       | 1            | 1          |

Any idea on how to produce this output table? I know it will require a loop at some point, but not entirely sure where to start. Perhaps a for each site, program, and then either ANC or S04. The R code below is from an individual site and parameter combo, but this would be a pain to have to replicate for the 100’s of sites and 6 water quality parameters we have.

install.packages("trend")
library("trend")

#put our data in a time series (but this only creates 1 site and its time series)
time_series <- ts(Trends$parameter1, start=c(1990, 1), end=c(1994, 1), frequency=1)
print(time_series)


#Run the MK Test and Sens Slope from package trend
mk.test(time_series, alternative = c("two.sided", "greater", "less"),
continuity = TRUE)
sens.slope(time_series, conf.level = 0.95) 

Output Example - these are results from my actual data, not the example dataset (since I haven't succesfully run the MKtest on all sites in the example data). Numbers with the ^^^^ below them are the numbers I need for the final output table.


> mk.test(time_series , alternative = c("two.sided", "greater", "less"),
+ continuity = TRUE)

    Mann-Kendall trend test

data:  time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true S is not equal to 0
sample estimates:
           S         varS          tau 
-261.0000000 2058.3333333   -0.8030769 
^^^^^^^^^^^^

> sens.slope(time_series , conf.level = 0.95)

    Sens slope

data:  time_series
z = -5.7308, n = 26, p-value = 9.996e-09
alternative hypothesis: true z is not equal to 0
95 percent confidence interval:
 -1.3187075 -0.9495238
sample estimates:
Sens slope
  -1.136842
  ^^^^^^^^^

Solution

  • I have a different approach that uses the tidyverse. It might be a bit outside your experience so far but I recommend checking it out as I find it is easier to use than base R when you are coming from excel

    library(dplyr)
    library(ggplot2)
    library(tidyr)
    library(purrr)
    
    install.packages("trend")
    library("trend")  
    
    out <- dat %>% gather(parameter, value, ANC, SO4) %>% 
      group_by(parameter, Site) %>% nest() %>% 
      mutate(ts_out = map(data, ~ts(.x$value, start=c(1990, 1), end=c(1994, 1), frequency=1))) %>% 
      mutate(mk_res = map(ts_out, ~mk.test(.x, alternative = c("two.sided", "greater", "less"),
                                          continuity = TRUE)),
             sens = map(ts_out, ~sens.slope(.x, conf.level = 0.95))) %>% 
      mutate(mk_stat = map_dbl(mk_res, ~.x$p.value),
             sens_stat = map_dbl(sens, ~.x$p.value)) %>% 
      select(parameter, Site, mk_stat, sens_stat)
    
    out
    # A tibble: 8 x 4
      parameter Site     mk_stat sens_stat
      <chr>     <fct>      <dbl>     <dbl>
    1 ANC       " A    "  0.0275    0.0275
    2 ANC       " B    "  0.0275    0.0275
    3 ANC       " C    "  0.0275    0.0275
    4 ANC       " D    "  0.0275    0.0275
    5 SO4       " A    "  0.0275    0.0275
    6 SO4       " B    "  0.0275    0.0275
    7 SO4       " C    "  0.0275    0.0275
    8 SO4       " D    "  0.0275    0.0275
    

    This gives the output in a table. I am not sure if that is the part you want to pull out from the tests but that should be easier to change

    I recommend looking at the output at each step to get an idea of the structure. A good resource for this style of analysis is R for Data Science Many Models Chapter