I have a dataframe (df1) with multiple rows species, and the date-time event at which that species occurs that looks like this:
df1 <- as.data.frame(sample(seq(from=as.POSIXct("2023-07-01 00:00"),
to=as.POSIXct("2023-07-01 00:20"), by="sec"), 21))
df1
colnames(df1) <- c('day.hour.df1') #rename column of df2
df1$Species <- c("a", "b", "b", "a", "c", "NA", "a", "a", "c", "b", "b",
"c", "c", "NA", "a", "a", "b", "b", "a", "NA", "b")
# add species column
names(df1)
And I have a second dataframe (df2) that, as columns, contains a start (as day and time) (START.df2) an end (END.df2), and species (here "a"). the second dataframe looks like follows:
df2 <- as.data.frame(seq(from=as.POSIXct("2023-07-01 00:00:00"),
to=as.POSIXct("2023-07-01 00:20:00"), by="min"))
df2
df2$time2 <- (seq(from=as.POSIXct("2023-07-01 00:00:59"),
to=as.POSIXct("2023-07-01 00:20:59"), by="min"))
names(df1)
df2$species <- (c('a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'))
df2
colnames(df1) <- c('START.df2', 'END.df2', 'Species')
I want to add to the second dataframe (df2) a new column called "occurrence_a", where, if the species "a" contained in df1 occurs at a day_time (day.hour.df1) in between START (START.df2) and END (END.df2), of dataframe 2 (df2), it gives a 1, otherwise, it gives 0.
I tried this but without success:
library(dplyr)
df2 %>% left_join(df1, by = "Species") %>%
mutate( = between(day.hour.df2, START.df1, END.df1)) %>%
group_by(species, day.hour.df1) %>%
summarise(OCCURRENCEa = any(OCCURRENCEa))
df2
I'm not entirely sure what output you want, but I think this should be close at least:
left_join(df2, df1, join_by(Species, between(y$day.hour.df1, x$START.df2, x$END.df2))) %>%
group_by(Species, START.df2, END.df2) %>%
summarise(
OCCURRENCE = any(!is.na(day.hour.df1)) %>% as.numeric(),
.groups = 'drop'
) %>%
pivot_wider(names_from = Species, values_from = OCCURRENCE, names_prefix = 'OCCURRENCE')
You don't really need the pivotting and such for this example, but I am assuming you have multiple species in your real data and want different columns for those.