Search code examples
rdataframetrend

Can we compute trend for all columns by group?


I have a data frame as shown below. I want to compute the trends of all numeric columns and if they are significant or not per name.

   library(EnvStats)
  dat=structure(list(date = c("1983-12-01", "1984-01-01", 
  "1984-02-01", 
                    "1984-03-01", "1984-04-01", "1984-05-01"), 
  rig = c(68.1, 62.4, 
                                                                       
   67.5, 78.9, 81.7, 72.2), pass = c(9.57, 10.49, 11.97, 
  11.43, 9.54, 
                                                                                                         
  8.98), name = structure(c(1L, 2L, 3L, 4L, 5L, 6L), levels = 
  c("az",  "az", "nc", "nc", "et", "et"), class = "factor")), 
  row.names = c(NA, 6L), class = "data.frame")

Is there a function in R to do this? We can use:

kendallTrendTest(rig ~ date)

but how should I do this to have the output like this for all colums?:

  name    rig_trend_slope   rig_trend_pvalue pass_trend_slope   pass_trend_pvalue
  az
  nc
  et

Solution

  • As @Quiten mentioned the number of observations in the data are limited, so a group by approach would return error. If we want to do this on multiple columns, loop across the columns in summarise, extract the needed components from the list output as a tibble and then use unnest_wider to create wide dataset

    library(dplyr)
    library(tidyr)
    library(EnvStats)
    library(tibble)
    library(flextable)
    library(officer)
    dat1 <- dat %>%
      mutate(date2 = as.numeric(as.Date(date))) %>%
      # group_by(name) %>%
      # or use reframe in the devel version
      summarise(across(rig:pass, ~ 
                         {tmp <- kendallTrendTest(reformulate('date2', 
         response = cur_column()))
                         tibble(trend_slope = tmp$estimate['slope'],
                                trend_pvalue = tmp$p.value)})) %>%
      unnest_wider(where(is_tibble), names_sep = "_") %>%
      flextable %>%
      colformat_double(i=1,digit=2 )
    for(nm in dat1$col_keys)
      {dat1 <- dat1 %>%
      color(i= as.formula(sprintf("~%s<0", nm)), j=nm, color="red")
      }
      
    

    -output

    enter image description here

    To save as word format, we may use save_as_docx from officer package

    tf <- tempfile(fileext = ".docx")
    save_as_docx(
      dat1, path = tf
    )
    

    -output enter image description here

    NOTE: Here the group_by(name) is commented as there are only 2 observations and it returns in error for all the cases. With the full dataset, uncomment the group_by(name)

    The broom::tidy seems to have an issue with the class from kendallTrendTest