I am trying to use rep
with dplyr
but I do not fully understand why I can not make it work.
My data look like this. What I want is to simply repeat dayweek
by n
for each id
.
head(dt4)
id dayweek n
1 1 Friday 3
2 1 Monday 3
3 1 Saturday 3
4 1 Sunday 3
5 1 Thursday 3
6 1 Tuesday 3
What I am trying to do is this within a dplyr
flow
cbind(rep(dt4$id, dt4$n), rep(as.character(dt4$dayweek), dt4$n) )
which gives
[,1] [,2]
[1,] "1" "Friday"
[2,] "1" "Friday"
[3,] "1" "Friday"
[4,] "1" "Monday"
[5,] "1" "Monday"
[6,] "1" "Monday"
I do not understand why this code does not work
dt4 %>%
group_by(id) %>%
summarise(rep(dayweek, n))
Error: expecting a single value
Could someone help me with this ?
the data
dt4 = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dayweek = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L), .Label = c("Friday", "Monday", "Saturday", "Sunday",
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), n = c(3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), class = "data.frame", .Names = c("id",
"dayweek", "n"), row.names = c(NA, -21L))
To get the same result as cbind
, we can use do
. As @DavidArenburg mentioned, summarise
output a single value/row per each group combination whereas using mutate
we get the output with the same number of rows. But, here we are doing a different operation which can be done within the do
environment. In the code .
signifies the dataset. If we want to extract the 'id' column from dt4
, we can either use dt4$id
or dt4[['id']]
. Replace the dt4
with .
.
library(dplyr)
dt4 %>%
group_by(id) %>%
do(data.frame(id=.$id, v1=rep(.$dayweek, .$n)))
#Source: local data frame [63 x 2]
#Groups: id
# id v1
#1 1 Friday
#2 1 Friday
#3 1 Friday
#4 1 Monday
#5 1 Monday
#6 1 Monday
#7 1 Saturday
#8 1 Saturday
#9 1 Saturday
#10 1 Sunday
#.. .. ...
Or another option based on @Frank's comments would be to specify the row index generated from rep
inside slice
and select
the columns that we need to keep.
dt4 %>%
slice(rep(1:n(),n)) %>%
select(-n)