Search code examples
rdplyrgroup-summaries

Conditional and grouped summaries by week dplyr


Complicating a previous question, lets say I have the following sock data.

>socks
year drawer  week  sock_total
1990  1       1        3                    
1990  1       2        4
1990  1       3        3 
1990  1       4        2 
1990  1       5        4
1990  2       1        1           
1990  2       2        1
1990  2       3        1
1990  2       4        1 
1990  2       5        2
1990  3       1        3
1990  3       2        4 
1990  3       3        4
1990  3       4        4
1990  3       5        4
1991  1       1        4
1991  1       2        3
1991  1       3        2
1991  1       4        2 
1991  1       5        3
1991  2       1        1           
1991  2       2        3
1991  2       3        4
1991  2       4        4
1991  2       5        3
1991  3       1        2           
1991  3       2        3
1991  3       3        3
1991  3       4        2
1991  3       5        3

How can I use summarise in dplyr to create a new variable growth which equals 1 if their was an increase in each week between the first year and the second year-- else 0. The data should look like this

>socks
 drawer  week growth 
  1       1        1        
  1       2        0   
  1       3        0   
  1       4        0   
  1       5        0   
  2       1        0        
  2       2        1   
  2       3        1   
  2       4        1   
  2       5        1   
  3       1        0   
  3       2        0   
  3       3        0   
  3       4        0   
  3       5        0

Also, how would you handle data where a drawer did not have a corresponding week in one of the years. aka add NA if a week was missing.


Solution

  • The answer would be very similar to the previous, but group by drawer and week, comment by @eipi10 is also a great option; You can handle missing year for a specific drawer and week by using index after the subset, which turns a length zero object into NA:

    For instance:

    df %>% 
        group_by(drawer, week) %>% 
        summarise(growth = +(sock_total[year==1991][1] - sock_total[year==1990][1] > 0))
    #                                              ^^^                         ^^^
    # A tibble: 15 x 3
    # Groups:   drawer [?]
    #   drawer  week growth
    #    <int> <int>  <int>
    # 1      1     1      1
    # 2      1     2      0
    # 3      1     3      0
    # 4      1     4      0
    # 5      1     5      0
    # 6      2     1      0
    # 7      2     2      1
    # 8      2     3      1
    # 9      2     4      1
    #10      2     5      1
    #11      3     1      0
    #12      3     2      0
    #13      3     3      0
    #14      3     4      0
    #15      3     5     NA
    

    The data has left out the year 1991 for drawer 3 and week 5:

    structure(list(year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
    1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
    1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 
    1991L, 1991L, 1991L, 1991L, 1991L), drawer = c(1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), week = c(1L, 2L, 3L, 4L, 
    5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
    1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), sock_total = c(3L, 4L, 3L, 
    2L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 
    3L, 1L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 2L)), .Names = c("year", 
    "drawer", "week", "sock_total"), class = "data.frame", row.names = c(NA, 
    -29L))