Search code examples
rdate-arithmeticweek-number

Group dates by week in R


I wish to label my data based on their week. This is my data:

df2 <- structure(list(Order_Date = structure(c(16735, 16805, 16753, 
16830, 17075, 17009, 17085, 16740, 16891, 16750, 16820, 16849, 
16906, 16929, 16746, 16731, 16786, 16873, 16895, 16931), class = "Date")), .Names = "Order_Date", row.names = c(NA, 
-20L), class = "data.frame")

and I tried to label them based on the week (0th week, 1th week and ....) and I wish to group my data by the week later

And I tried this:

# order by data
library (dplyr)
df2<- arrange(df2, Order_Date)


# label them by week
$df2$week <- cumsum(weekdays(df2$Order_Date) == "Friday")

It doesn't give me the correct result and I have the following output, which is weird

   Order_Date week
1  2015-10-27    0
2  2016-01-05    0
3  2015-11-14    0
4  2016-01-30    0
5  2016-10-01    0
6  2016-07-27    0
7  2016-10-11    0
8  2015-11-01    0
9  2016-03-31    0
10 2015-11-11    0
11 2016-01-20    0
12 2016-02-18    0
13 2016-04-15    1
14 2016-05-08    1
15 2015-11-07    1
16 2015-10-23    2
17 2015-12-17    2
18 2016-03-13    2
19 2016-04-04    2
20 2016-05-10    2

Ideally, I'd like to have this output:

  Order_Date   label
1  2015-10-23   0
2  2015-10-27   0
3  2015-11-01   1
4  2015-11-07   2
5  2015-11-11   2
6  2015-11-14   3
7  2015-12-17   8
8  2016-01-05   10

since row number 8 occurs 10 week after row number 1 but also a solution that generates the following is my second alternative that shows these data are not in the same week:

  Order_Date   label
1  2015-10-23   0
2  2015-10-27   0
3  2015-11-01   1
4  2015-11-07   2
5  2015-11-11   2
6  2015-11-14   3
7  2015-12-17   4
8  2016-01-05   5

Solution

  • The code below calculates the current week relative to the minimum week in the data. week2 uses modular arithmetic to make the code more concise, although the week numbers don't always line up exactly with the direct calculation of years and week numbers using lubridate functions.

    library(dplyr)
    library(lubridate)
    
    df2 %>% mutate(week = (year(Order_Date) - year(min(Order_Date)))*52 + 
                     week(Order_Date) - week(min(Order_Date)),
                   week2 = (as.numeric(Order_Date) %/% 7) - (as.numeric(min(Order_Date)) %/% 7)) %>%
      arrange(Order_Date)
    
       Order_Date week week2
    1  2015-10-23    0     0
    2  2015-10-27    0     0
    3  2015-11-01    1     1
    4  2015-11-07    2     2
    5  2015-11-11    2     2
    6  2015-11-14    3     3
    7  2015-12-17    8     8
    8  2016-01-05   10    10
    9  2016-01-20   12    12
    10 2016-01-30   14    14
    11 2016-02-18   16    17
    12 2016-03-13   20    20
    13 2016-03-31   22    23
    14 2016-04-04   23    23
    15 2016-04-15   25    25
    16 2016-05-08   28    28
    17 2016-05-10   28    28
    18 2016-07-27   39    39
    19 2016-10-01   49    49
    20 2016-10-11   50    50