Search code examples
rdataframedifftime

Group by specific date sequence in r


I have a table with order | place | date and want to group and count each order at one place in a row if it was on the next day from the previous order (time difference <= 1 day). Grouped orders to show earlier date and instances count. Please see example below.

Input:

| Order | Place | Date     |
|-------|-------|----------|
| 11    | A     | 01.11.19 |
| 11    | A     | 02.11.19 |
| 11    | A     | 05.11.19 |
| 22    | B     | 01.11.19 |
| 22    | D     | 02.11.19 |
| 22    | D     | 03.11.19 |
| 33    | A     | 01.11.19 |

Output:

| Order | Place | Date     | Count |
|-------|-------|----------|-------|
| 11    | A     | 01.11.19 | 2     |
| 11    | A     | 05.11.19 | 1     |
| 22    | B     | 01.11.19 | 1     |
| 22    | D     | 02.11.19 | 2     |
| 33    | A     | 01.11.19 | 1     |

Solution

  • Here is another solution using dplyr:

    library(dplyr)
    
    df1 %>% 
      mutate(Date = as.Date(Date, "%d.%m.%y")) %>% 
      group_by(Order, Place, DateLag = !((Date - lag(Date, default = first(Date)))>1)) %>% 
      summarise(Date = first(Date), Count = n()) %>% 
      ungroup %>% select(-DateLag) %>% 
      arrange(Order, Place, Date)
    
    ># # A tibble: 5 x 4
    >#   Order Place Date       Count
    >#   <int> <chr> <date>     <int>
    ># 1    11 A     2019-11-01     2
    ># 2    11 A     2019-11-05     1
    ># 3    22 B     2019-11-01     1
    ># 4    22 D     2019-11-02     2
    ># 5    33 A     2019-11-01     1
    

    Data:

    read.table(text=" Order  Place  Date     
     11     A      01.11.19 
     11     A      01.11.19 
     11     A      05.11.19 
     22     B      01.11.19 
     22     D      02.11.19 
     22     D      03.11.19 
     33     A      01.11.19", header=T, stringsAsFactors=F) -> df1