Complicating a previous question, lets say I have the following sock data.
>socks
year drawer week sock_total
1990 1 1 3
1990 1 2 4
1990 1 3 3
1990 1 4 2
1990 1 5 4
1990 2 1 1
1990 2 2 1
1990 2 3 1
1990 2 4 1
1990 2 5 2
1990 3 1 3
1990 3 2 4
1990 3 3 4
1990 3 4 4
1990 3 5 4
1991 1 1 4
1991 1 2 3
1991 1 3 2
1991 1 4 2
1991 1 5 3
1991 2 1 1
1991 2 2 3
1991 2 3 4
1991 2 4 4
1991 2 5 3
1991 3 1 2
1991 3 2 3
1991 3 3 3
1991 3 4 2
1991 3 5 3
How can I use summarise
in dplyr
to create a new variable
growth
which equals 1
if their was an increase in each week between the first year and the second year-- else 0
. The data should look like this
>socks
drawer week growth
1 1 1
1 2 0
1 3 0
1 4 0
1 5 0
2 1 0
2 2 1
2 3 1
2 4 1
2 5 1
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
Also, how would you handle data where a drawer did not have a corresponding week in one of the years. aka add NA
if a week was missing.
The answer would be very similar to the previous, but group by drawer
and week
, comment by @eipi10 is also a great option; You can handle missing year for a specific drawer
and week
by using index after the subset
, which turns a length zero object into NA:
For instance:
df %>%
group_by(drawer, week) %>%
summarise(growth = +(sock_total[year==1991][1] - sock_total[year==1990][1] > 0))
# ^^^ ^^^
# A tibble: 15 x 3
# Groups: drawer [?]
# drawer week growth
# <int> <int> <int>
# 1 1 1 1
# 2 1 2 0
# 3 1 3 0
# 4 1 4 0
# 5 1 5 0
# 6 2 1 0
# 7 2 2 1
# 8 2 3 1
# 9 2 4 1
#10 2 5 1
#11 3 1 0
#12 3 2 0
#13 3 3 0
#14 3 4 0
#15 3 5 NA
The data has left out the year 1991 for drawer 3 and week 5:
structure(list(year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L,
1991L, 1991L, 1991L, 1991L, 1991L), drawer = c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), week = c(1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), sock_total = c(3L, 4L, 3L,
2L, 4L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 2L, 2L,
3L, 1L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 2L)), .Names = c("year",
"drawer", "week", "sock_total"), class = "data.frame", row.names = c(NA,
-29L))