Search code examples
rggplot2tidyverselubridate

How to set specific date as the beginning date of the year


I want to plot the average annual value of the stream flow data using WATER YEAR which starts at October and ends at September (say 10/01/1983 to 09/30/1984, this is defined as 1984 water year) I tried to find solutions elsewhere but I have failed.

Now I'm using the following script to plot the annual average flow

library(tidyverse)
library(lubridate)
library(ggplot2)

#df <- read_csv('dataframe.csv')

df <- df %>% 
  mutate(date = mdy(df$date))

df <- df %>%
  mutate(year = floor_date(date, "year")) %>%
  group_by(year) %>%
  summarize(avg = mean(flow)) 


y <- df$avg
x <- as.Date(df$year, format = "Y")
d <- data.frame(x = x, y = y)

# interpolate values from zero to y and create corresponding number of x values
vals <- lapply(d$y, function(y) seq(0, y, by = 0.1))
y <- unlist(vals)
mid <- rep(d$x, lengths(vals))
d2 <- data.frame(x = mid - 100,
                 xend = mid + 100,
                 y = y,
                 yend = y)

ggplot(data = d2, aes(x = x, xend = xend, y = y, yend = yend, color = y)) +
  geom_segment(size = 2) +
  scale_color_gradient2(low = "midnightblue", mid = "deepskyblue", high = "aquamarine", 
                        midpoint = max(d2$y)/2)+
  scale_x_date(date_breaks = "1 year",date_labels = "%Y", expand = c(0,0)) +
  theme(axis.text.x = element_text(angle=90, vjust=.5))+
  labs(x = "Years", y = "Mean Annual Flow (cms)")+
  ggtitle("Mean Annual Flow, Rancho River at ELdorado (1983-2020)")+
  theme(plot.title = element_text(hjust = 0.5))

For this I got the following results using calendar year enter image description here

If I used Water Year there will be no results for 1983

The data frame can be found in the following link

https://drive.google.com/file/d/11PVub9avzMFhUz02cHfceGh9DrlVQDbD/view?usp=sharing

Kindly assist.


Solution

  • If date is superior to 10/01/year(date) it means that this is the next year (in water years):

    df %>%
     mutate(date=mdy(date), year=year(date), year = year + (date >= mdy(paste0("10/01/", year))))
    
    # A tibble: 5,058 x 3
       date        flow  year
       <date>     <dbl> <dbl>
     1 1983-10-01  3.31  1984
     2 1983-10-02  3.19  1984
     3 1983-10-03  3.7   1984
     4 1983-10-04  3.83  1984
     5 1983-10-05  3.44  1984
     6 1983-10-06  4.37  1984
     7 1983-10-07  6.78  1984
     8 1983-10-08  6.3   1984
     9 1983-10-09  6.46  1984
    10 1983-10-10  6.62  1984
    # … with 5,048 more rows