Search code examples
rfactorslevels

R: do calculation for each factor level separately, then calculate min/mean/max over levels


So I do have the output of a water distribution model, which is inflow and discharge values of a river for every hour. I have done 5 model runs

reproducible example:

df <- data.frame(rep(seq(
                  from=as.POSIXct("2012-1-1 0:00", tz="UTC"),
                  to=as.POSIXct("2012-1-1 23:00", tz="UTC"),
                  by="hour"
                  ),5),
                as.factor(c(rep(1,24),rep(2,24),rep(3,24), rep(4,24),rep(5,24))),
                rep(seq(1,300,length.out=24),5),
                rep(seq(1,180, length.out=24),5) )

colnames(df)<-c("time", "run", "inflow", "discharge")

In reality, of course, the values for the runs are varying. (And I do have a lot of more data, as I do have 100 runs and hourly values of 35 years).

So, at first I would like to calculate a water scarcity factor for every run, which means I need to calculate something like (1 - (discharge / inflow of 6 hours before)), as the water needs 6 hours to run through the catchment.

 scarcityfactor <- 1 - (discharge / lag(inflow,6))

And then I want to calculate to a mean, max and min of scarcity factors over all runs (to find out the highest, the lowest and mean value of scarcity that could happen at every time step; according to the different model runs). So I would say, I could just calculate a mean, max and min for every time step:

f1 <- function(x) c(Mean = (mean(x)), Max = (max(x)), Min = (min(x)))
results <- do.call(data.frame, aggregate(scarcityfactor ~ time, 
      data = df,                                                              
      FUN = f1))

Can anybody help me with the code??


Solution

  • library(tidyverse)
    
    df %>%
      group_by(run) %>%
      mutate(scarcityfactor = 1 - discharge / lag(inflow,6)) %>%
      group_by(time) %>%
      summarise(Mean = mean(scarcityfactor), 
                Max = max(scarcityfactor), 
                Min = min(scarcityfactor))
    
    # # A tibble: 24 x 4
    #  time                   Mean     Max     Min
    #  <dttm>                <dbl>   <dbl>   <dbl>
    # 1 2012-01-01 00:00:00  NA      NA      NA    
    # 2 2012-01-01 01:00:00  NA      NA      NA    
    # 3 2012-01-01 02:00:00  NA      NA      NA    
    # 4 2012-01-01 03:00:00  NA      NA      NA    
    # 5 2012-01-01 04:00:00  NA      NA      NA    
    # 6 2012-01-01 05:00:00  NA      NA      NA    
    # 7 2012-01-01 06:00:00 -46.7   -46.7   -46.7  
    # 8 2012-01-01 07:00:00  -2.96   -2.96   -2.96 
    # 9 2012-01-01 08:00:00  -1.34   -1.34   -1.34 
    #10 2012-01-01 09:00:00  -0.776  -0.776  -0.776
    # # ... with 14 more rows