Search code examples
rdatelinear-regressionmedian

Finding dates majority of data fall into (median/ CI?) in R


I have a dataset containing a column of dates (daily values summed by week) and a column of values representing bird mortality. I need to find out the dates the majority of mortality fall into. The values are not normally distributed with a major peak at the end of July due to increased surveying effort. Some weeks had zero observed mortalities.

Date Observed Value
2022-05-08 4
2022-05-22 1
2022-05-29 5
2022-06-05 1
2022-06-12 10
2022-06-19 64
2022-06-26 4
2022-07-10 800
2022-07-17 300
2022-07-24 207
2022-07-31 3002
2022-08-07 2

I'm not new to R but I am evidently rusty on my statistical analysis. Could anyone point me in the right direction?

I have tried simple things: estimating median of Date Observed (using summary and summaryStats(DateObserved, quartiles=TRUE)[[7]] from EnvStats library. But after plotting the values over time, I do not think the resulting quartiles look right. Also, tried fitting a linear model.

Thank you sincerely in advance for any leads!


Solution

  • library(tidyverse)
    df_1 <- read_tsv(file="DateObserved Value
    2022-05-08  4
    2022-05-22  1
    2022-05-29  5
    2022-06-05  1
    2022-06-12  10
    2022-06-19  64
    2022-06-26  4
    2022-07-10  800
    2022-07-17  300
    2022-07-24  207
    2022-07-31  3002
    2022-08-07  2") |> mutate(dt=as.numeric(DateObserved))
    
    dt_seq <- seq(from=min(df_1$dt),
                  to = max(df_1$dt))
    
    dt_seq_pretty_indx <- seq(from=min(df_1$dt),
                              to = max(df_1$dt),
                              length.out=6)
    dt_seq_pretty <- seq(from=min(df_1$DateObserved),
                         to = max(df_1$DateObserved),
                         length.out=6)
    
    
    plot(df_1$dt,
         df_1$Value, xaxt = "n")
    axis(side=1,
         at = dt_seq_pretty_indx,
         labels=dt_seq_pretty)
    
    
    
    myloess <- loess(Value ~ dt,df_1,span = 0.4)
    
    pred <- predict(myloess,newdata = data.frame(dt=dt_seq))
    
    lines(x=dt_seq,
          y=pred)
    
    library(Hmisc)
    wq <- wtd.quantile(x=dt_seq,
                 weights = pred,
                 probs = c(.25,.75))
    
    abline(v=wq[1],col="blue")
    abline(v=wq[2],col="blue")
    print(as.Date(wq,origin="1970-01-01"))
    

    enter image description here