Search code examples
rdplyrtimeseries

Counting the occurence in the regular time spans in R


I have a data frame with year, site and pointer.

year<-c(1200, 1201, 1202, 1204)
site_a<-c("0", "1", "1", "0")
site_b<-c("0", "0", "1", "0")
site_c<-c("0", "-1", "1", "0")
df<-data.frame(year,site_a, site_b, site_c)
lw2<-pivot_longer(df, cols=site_a:site_c, names_to="site", values_to="pointer")

I want to count the number of pointer (1 and -1) in two years spans like:

intval<-2
lw2%>%
  group_by(site, pointer)%>%
  filter(pointer == 1 | pointer == -1)%>%
  group_by(pointer, add = TRUE)%>%
  summarize(startDate = seq(from=1200, to=1204, by=2),
            endDate = startDate + intval-1, 
            frequency = n())

As a result I am getting the same, total number of pointers for each time spans,

site pointer startDate endDate frequency
site_a 1 1200 1201 2
site_a 1 1202 1203 2
site_a 1 1204 1205 2

while I want to get number of pointers for each time span:

site pointer startDate endDate frequency
site_a 1 1200 1201 1
site_a 1 1202 1203 1
site_a 1 1204 1205 0

Do you have suggestions on how to get this?


Solution

  • Here's how I would do it.

    library(dplyr)
    library(tidyr)
    
    lw2 %>%
      right_join(expand(., year = 1200:1205, site)) %>%
      group_by(year%/%2, site) %>%
      summarise(startDate = min(year),
                endDate = max(year),
                frequency = sum(pointer != 0, na.rm = T),
                .groups = 'drop') %>%
      select(-1)
    
    #> # A tibble: 9 × 4
    #>   site   startDate endDate frequency
    #>   <chr>      <dbl>   <dbl>     <int>
    #> 1 site_a      1200    1201         1
    #> 2 site_b      1200    1201         0
    #> 3 site_c      1200    1201         1
    #> 4 site_a      1202    1203         1
    #> 5 site_b      1202    1203         1
    #> 6 site_c      1202    1203         1
    #> 7 site_a      1204    1205         0
    #> 8 site_b      1204    1205         0
    #> 9 site_c      1204    1205         0