How to calculate date over certain sized data gaps

I have a dataframe of dates when chickens hatched a chick or laid an egg. I want to obtain the hatch date and lay date for each unique nest and year combination.

However, there are many gaps in the dataframe where the farm was not visited for several days/weeks. I want to consider this when I calculate, where:

If there are 0 eggs on day A, then a gap of 1 day on day B, then 1 or 2 eggs on day C, the lay date would be day B.
If there are 0 eggs on day A, then a gap of 2 days on day B and C, then 1 or 2 eggs on day D, the lay date would be day B.
If there are 0 eggs on day A, then a gap of 3 days on day B, C and D, then 1 or 2 eggs on day E, the lay date would be day C.
If there are 0 eggs on day A, then a gap of 4 or more days, then 1 or 2 eggs on day X, the lay date would be day X (because we cannot estimate if more than 3 days are unknown).

BUT, if the FIRST observation for a specific nestyear is an egg or a chick, I want to exclude that because we do not know how long that egg or chick has been there. So that would also be an NA for hatch/lay date.

Here is an example of my data:

library(dplyr)

table <- "nestyear       date adults eggs chicks
1  2017_29 2017-06-01      1    0      0
2  2017_29 2017-06-02      1    0      0
3  2017_29 2017-06-04      1    1      0
4  2017_29 2017-06-05      1    2      0
5  2017_29 2017-06-07      1    1      1
6  2017_29 2017-06-08      2    0      2
7  2017_81 2017-06-01      1    0      0
8  2017_81 2017-06-06      1    1      0
9  2017_81 2017-06-07      1    1      0
10 2017_81 2017-06-08      1    2      0
11 2019_81 2017-06-10      1    1      1
12 2019_20 2019-06-01      1    1      0
13 2019_20 2019-06-02      1    0      1
14 2019_20 2019-06-03      1    0      0
15 2019_20 2019-06-09      1    0      1
16 2019_28 2019-06-01      1    0      0
17 2019_28 2019-06-02      1    0      0
18 2019_28 2019-06-03      1    0      0
19 2019_28 2019-06-05      2    2      0
20 2019_28 2019-06-14      2    0      1"

#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df

And here is what I have come up with so far:

First I wrote a loop to get the date when an egg appeared

nests <- unique(df$nestyear)
df$date.lay <- NA # Make empty dataframe for date lay
df_2 <- df[0 , ] # zero rows of all the same columns
for(i in 1:length(nests)){
  target_nest <- subset(df, nestyear == nests[i])
  date_first_lay <- target_nest[!duplicated(paste0(as.Date(target_nest$date), target_nest$eggs)) & target_nest$eggs > 0, ]
  date_first_lay_2 <- rownames(date_first_lay)
  target_nest$date.lay <- "N"
  target_nest$date.lay[rownames(target_nest) %in% date_first_lay_2] <- "Y"
  df_2 <- rbind(df_2, target_nest)
}

The above code gives us each time a new egg is laid, but I only want the FIRST egg laid for each nest

So, from the column date.lay created above, we can pull the first time date.lay = Y, and this will give us the first egg lay date for each nest

This code pulls the minimum date for each nestyear. However, it pulls the minimum date for both date.lay = Y and date.lay = N, so I also remove that.

datelay <- df_2 %>%
  group_by(nestyear, date.lay) %>%
  filter(date == min(date))
head(datelay)

# Remove date.lay = N
datelay <- datelay %>%
  filter(date.lay == "Y")
head(datelay)

Then I want to do the same rules for when eggs hatched.

Essentially, my goal output would be this (based on the data above):

nest.year   date.lay    date.hatch
2017_29     2017-06-03  2017-06-06
2017_81     2017-06-06  2017-06-09
2019_20     NA          2019-06-02
2019_28     2019-06-04  2019-06-14

Any help would be appreciated.

Solution

df %>% 
    mutate(date = as.Date(date)) %>% 
    arrange(nestyear, date) %>% # just to be safe!
    group_by(nestyear) %>%
    mutate(first_egg = eggs > 0 & lag(cummax(eggs), default = 0) == 0 & row_number() != 1,
           date.lay = case_when(first_egg == FALSE ~ as.Date(NA),
                             date - lag(date) == 2 ~ lag(date) + 1,
                             date - lag(date) == 3 ~ lag(date) + 1,
                             date - lag(date) == 4 ~ lag(date) + 2))

My output doesn't quite match your output, so maybe I misinterpreted your rules. I don't know how to classify the lay date if there is no gap. But I think you can adapt what I did to solve your problem. If I'm missing something important, then please let me know.