Search code examples
rdplyrrep

Applying rep() within groups through dplyr


I've been trying to replicate a binary output of 1 and 2 within groups. I'd like to make use of rep and dplyr, but I can't seem to understand how to apply rep within groups. I've been able to do it by manually separating the groupings and specifying the correct range per group. I would like to know how repcould be applied using dplyr.

Here's a sample data.

df <- data.frame(date = c("2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02"),
                 loc =c("AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD"),
                 cat = c("a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"))

This is basically the code I run per grouping applied on the entire dataset.

df$type <- rep(1:2,nrow(df)/2)

As you can see, the output disregards the column cat. cat b & d should have started at 1.

         date loc cat type
1  2017-01-01  AB   a    1
2  2017-01-01  AB   a    2
3  2017-01-01  AB   a    1
4  2017-01-01  AB   b    2
5  2017-01-01  AB   b    1
6  2017-01-01  AB   b    2
7  2017-01-01  AB   b    1
8  2017-01-02  AB   b    2
9  2017-01-02  CD   c    1
10 2017-01-02  CD   c    2
11 2017-01-02  CD   c    1
12 2017-01-02  CD   c    2
13 2017-01-02  CD   c    1
14 2017-01-02  CD   d    2
15 2017-01-02  CD   d    1
16 2017-01-02  CD   d    2
17 2017-01-02  CD   d    1

UPDATE: Here's the desired output.

        date loc cat type
1  2017-01-01  AB   a    1
2  2017-01-01  AB   a    2
3  2017-01-01  AB   a    1
4  2017-01-01  AB   b    1
5  2017-01-01  AB   b    2
6  2017-01-01  AB   b    1
7  2017-01-01  AB   b    2
8  2017-01-02  AB   b    1
9  2017-01-02  CD   c    1
10 2017-01-02  CD   c    2
11 2017-01-02  CD   c    1
12 2017-01-02  CD   c    2
13 2017-01-02  CD   c    1
14 2017-01-02  CD   d    1
15 2017-01-02  CD   d    2
16 2017-01-02  CD   d    1
17 2017-01-02  CD   d    2

Solution

  • Assuming that cat is the only relevant grouping variable here (not date and loc), you can do:

    library(dplyr)
    df = df %>%
        group_by(cat) %>%
        mutate(type = rep(1:2, length.out = length(cat)))
    # Output:
             date    loc    cat  type
           <fctr> <fctr> <fctr> <int>
    1  2017-01-01     AB      a     1
    2  2017-01-01     AB      a     2
    3  2017-01-01     AB      a     1
    4  2017-01-01     AB      b     1
    5  2017-01-01     AB      b     2
    6  2017-01-01     AB      b     1
    7  2017-01-01     AB      b     2
    8  2017-01-02     AB      b     1
    9  2017-01-02     CD      c     1
    10 2017-01-02     CD      c     2
    11 2017-01-02     CD      c     1
    12 2017-01-02     CD      c     2
    13 2017-01-02     CD      c     1
    14 2017-01-02     CD      d     1
    15 2017-01-02     CD      d     2
    16 2017-01-02     CD      d     1
    17 2017-01-02     CD      d     2
    18 2017-01-02     CD      d     1