I have a dataframe of dates when chickens hatched a chick or laid an egg. I want to obtain the hatch date and lay date for each unique nest and year combination.
However, there are many gaps in the dataframe where the farm was not visited for several days/weeks. I want to consider this when I calculate, where:
BUT, if the FIRST observation for a specific nestyear
is an egg or a chick, I want to exclude that because we do not know how long that egg or chick has been there. So that would also be an NA for hatch/lay date.
Here is an example of my data:
library(dplyr)
table <- "nestyear date adults eggs chicks
1 2017_29 2017-06-01 1 0 0
2 2017_29 2017-06-02 1 0 0
3 2017_29 2017-06-04 1 1 0
4 2017_29 2017-06-05 1 2 0
5 2017_29 2017-06-07 1 1 1
6 2017_29 2017-06-08 2 0 2
7 2017_81 2017-06-01 1 0 0
8 2017_81 2017-06-06 1 1 0
9 2017_81 2017-06-07 1 1 0
10 2017_81 2017-06-08 1 2 0
11 2019_81 2017-06-10 1 1 1
12 2019_20 2019-06-01 1 1 0
13 2019_20 2019-06-02 1 0 1
14 2019_20 2019-06-03 1 0 0
15 2019_20 2019-06-09 1 0 1
16 2019_28 2019-06-01 1 0 0
17 2019_28 2019-06-02 1 0 0
18 2019_28 2019-06-03 1 0 0
19 2019_28 2019-06-05 2 2 0
20 2019_28 2019-06-14 2 0 1"
#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df
And here is what I have come up with so far:
First I wrote a loop to get the date when an egg appeared
nests <- unique(df$nestyear)
df$date.lay <- NA # Make empty dataframe for date lay
df_2 <- df[0 , ] # zero rows of all the same columns
for(i in 1:length(nests)){
target_nest <- subset(df, nestyear == nests[i])
date_first_lay <- target_nest[!duplicated(paste0(as.Date(target_nest$date), target_nest$eggs)) & target_nest$eggs > 0, ]
date_first_lay_2 <- rownames(date_first_lay)
target_nest$date.lay <- "N"
target_nest$date.lay[rownames(target_nest) %in% date_first_lay_2] <- "Y"
df_2 <- rbind(df_2, target_nest)
}
The above code gives us each time a new egg is laid, but I only want the FIRST egg laid for each nest
So, from the column date.lay
created above, we can pull the first time date.lay = Y
, and this will give us the first egg lay date for each nest
This code pulls the minimum date for each nestyear
. However, it pulls the minimum date for both date.lay = Y
and date.lay = N
, so I also remove that.
datelay <- df_2 %>%
group_by(nestyear, date.lay) %>%
filter(date == min(date))
head(datelay)
# Remove date.lay = N
datelay <- datelay %>%
filter(date.lay == "Y")
head(datelay)
Then I want to do the same rules for when eggs hatched.
Essentially, my goal output would be this (based on the data above):
nest.year date.lay date.hatch
2017_29 2017-06-03 2017-06-06
2017_81 2017-06-06 2017-06-09
2019_20 NA 2019-06-02
2019_28 2019-06-04 2019-06-14
Any help would be appreciated.
df %>%
mutate(date = as.Date(date)) %>%
arrange(nestyear, date) %>% # just to be safe!
group_by(nestyear) %>%
mutate(first_egg = eggs > 0 & lag(cummax(eggs), default = 0) == 0 & row_number() != 1,
date.lay = case_when(first_egg == FALSE ~ as.Date(NA),
date - lag(date) == 2 ~ lag(date) + 1,
date - lag(date) == 3 ~ lag(date) + 1,
date - lag(date) == 4 ~ lag(date) + 2))
My output doesn't quite match your output, so maybe I misinterpreted your rules. I don't know how to classify the lay date if there is no gap. But I think you can adapt what I did to solve your problem. If I'm missing something important, then please let me know.