Search code examples
rdplyrpivotdata-manipulation

dplyr: pivot longer while keeping some wide/stacked


I'm trying to pivot a wide dataset to long and could not find a solution to my problem through searching here. I have a wide dataset that includes 3 friendship nominations over 3 waves, and then 3 follow up questions about each nomination at each wave. Sample data below shows the structure. The follow up questions are those with a, b, or c in them, and the number corresponds to the friend column at that wave (ex, w1a1 is wave 1 follow up question for friend 1 (w1fid1)). I have included a sample dataset of what I have and what I want:

Sample data:

df<-read.table(text= "id    w1fid1  w1fid2  w1fid3  w2fid1  w2fid2  w2fid3  w3fid1  w3fid2  w3fid3  w1a1    w1a2    w1a3    w1b1    w1b2    w1b3    w1c1    w1c2    w1c3    w2a1    w2a2    w2a3    w2b1    w2b2    w2b3    w2c1    w2c2    w2c3    w3a1    w3a2    w3a3    w3b1    w3b2    w3b3    w3c1    w3c2    w3c3
1   10  3   9   4   13  9   15  14  NA  1   1   0   0   1   0   1   0   0   1   1   1   1   0   0   1   1   1   1   1   NA  1   0   NA  0   0   NA
2   13  NA  20  4   18  19  12  10  8   1   NA  0   0   NA  0   0   NA  0   1   0   0   1   1   0   0   1   0   0   0   0   1   0   0   1   0   0
3   NA  9   19  19  8   20  17  9   2   NA  1   1   NA  0   0   NA  0   0   0   0   1   0   0   0   1   0   0   0   0   1   0   0   0   1   1   0
4   13  19  9   19  16  2   2   11  14  1   1   1   1   1   1   1   1   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0   1   0   1   0
5   11  8   9   18  20  7   9   17  12  0   1   1   0   0   0   1   1   1   1   0   1   1   1   0   1   0   1   1   0   0   1   1   1   0   1   0", header = TRUE)

Desired output:

df <- read.table(text= "id   wave   fid a   b   c
1   1   10  1   0   1
1   1   3   1   1   0
1   1   9   0   0   0
1   2   4   1   1   1
1   2   13  1   0   1
1   2   9   1   0   1
1   3   15  1   1   0
1   3   14  1   0   0
1   3   NA  NA  NA  NA
2   1   14  1   0   0
2   1   NA  NA  NA  NA
2   1   20  0   0   0
2   2   4   1   1   0
2   2   18  0   1   1
2   2   19  0   0   0
2   3   12  0   1   1
2   3   10  0   0   0
2   3   8   0   0   0
3   1   NA  NA  NA  NA
3   1   9   1   0   0
3   1   19  1   0   0
3   2   19  0   0   1
3   2   8   0   0   0
3   2   20  1   0   0
3   3   17  0   0   1
3   3   9   0   0   1
3   3   2   1   0   0
4   1   13  1   1   1
4   1   19  1   1   1
4   1   9   1   1   0
4   2   19  0   0   0
4   2   16  0   0   0
4   2   2   1   1   0
4   3   2   0   0   0
4   3   11  0   0   1
4   3   14  0   1   0
5   1   11  0   0   1
5   1   8   1   0   1
5   1   9   1   0   1
5   2   18  1   1   1
5   2   20  0   1   0
5   2   7   1   0   1
5   3   9   1   1   0
5   3   17  0   1   1
5   3   12  0   1   0", header = TRUE)

I have used this code:

df %>% pivot_longer(-id, 
                     names_to = c('wave', 'type', '.value'),
                     names_pattern = "(\\w+\\d+)(\\w+)(\\d+)") 

Which gave me this:

id wave  type    `1`   `2`   `3`
1 w1    fid      10     3     9
1 w2    fid       4    13     9
1 w3    fid      15    14    NA
1 w1    a         1     1     0
1 w1    b         0     1     0
1 w1    c         1     0     0
1 w2    a         1     1     1
1 w2    b         1     0     0
1 w2    c         1     1     1
1 w3    a         1     1    NA
1 w3    b         1     0    NA
1 w3    c         0     0    NA
2 w1    fid      13    NA    20
2 w2    fid       4    18    19
2 w3    fid      12    10     8
2 w1    a         1    NA     0
2 w1    b         0    NA     0
2 w1    c         0    NA     0
2 w2    a         1     0     0
2 w2    b         1     1     0
2 w2    c         0     1     0
2 w3    a         0     0     0
2 w3    b         1     0     0
2 w3    c         1     0     0
3 w1    fid      NA     9    19
3 w2    fid      19     8    20
3 w3    fid      17     9     2
3 w1    a        NA     1     1
3 w1    b        NA     0     0
3 w1    c        NA     0     0
3 w2    a         0     0     1
3 w2    b         0     0     0
3 w2    c         1     0     0
3 w3    a         0     0     1
3 w3    b         0     0     0
3 w3    c         1     1     0
4 w1    fid      13    19     9
4 w2    fid      19    16     2
4 w3    fid       2    11    14
4 w1    a         1     1     1
4 w1    b         1     1     1
4 w1    c         1     1     0
4 w2    a         0     0     1
4 w2    b         0     0     1
4 w2    c         0     0     0
4 w3    a         0     0     0
4 w3    b         0     0     1
4 w3    c         0     1     0
5 w1    fid      11     8     9
5 w2    fid      18    20     7
5 w3    fid       9    17    12
5 w1    a         0     1     1
5 w1    b         0     0     0
5 w1    c         1     1     1
5 w2    a         1     0     1
5 w2    b         1     1     0
5 w2    c         1     0     1
5 w3    a         1     0     0
5 w3    b         1     1     1
5 w3    c         0     1     0

I assume I have to rewrite names_to and names_pattern to get what I want, but I'm stuck on the proper syntax to get there.


Solution

  • You can use names_pattern and the special value .value in names_to:

    library(tidyverse)
    
    df %>%
      pivot_longer(cols = -id, 
                   names_pattern = '^w(\\d)(\\D+)\\d$', 
                   names_to = c('wave', '.value'))
    #>    id wave fid  a  b  c
    #> 1   1    1  10  1  0  1
    #> 2   1    1   3  1  1  0
    #> 3   1    1   9  0  0  0
    #> 4   1    2   4  1  1  1
    #> 5   1    2  13  1  0  1
    #> 6   1    2   9  1  0  1
    #> 7   1    3  15  1  1  0
    #> 8   1    3  14  1  0  0
    #> 9   1    3  NA NA NA NA
    #> 10  2    1  13  1  0  0
    #> 11  2    1  NA NA NA NA
    #> 12  2    1  20  0  0  0
    #> 13  2    2   4  1  1  0
    #> 14  2    2  18  0  1  1
    #> 15  2    2  19  0  0  0
    #> 16  2    3  12  0  1  1
    #> 17  2    3  10  0  0  0
    #> 18  2    3   8  0  0  0
    #> 19  3    1  NA NA NA NA
    #> 20  3    1   9  1  0  0
    #> 21  3    1  19  1  0  0
    #> 22  3    2  19  0  0  1
    #> 23  3    2   8  0  0  0
    #> 24  3    2  20  1  0  0
    #> 25  3    3  17  0  0  1
    #> 26  3    3   9  0  0  1
    #> 27  3    3   2  1  0  0
    #> 28  4    1  13  1  1  1
    #> 29  4    1  19  1  1  1
    #> 30  4    1   9  1  1  0
    #> 31  4    2  19  0  0  0
    #> 32  4    2  16  0  0  0
    #> 33  4    2   2  1  1  0
    #> 34  4    3   2  0  0  0
    #> 35  4    3  11  0  0  1
    #> 36  4    3  14  0  1  0
    #> 37  5    1  11  0  0  1
    #> 38  5    1   8  1  0  1
    #> 39  5    1   9  1  0  1
    #> 40  5    2  18  1  1  1
    #> 41  5    2  20  0  1  0
    #> 42  5    2   7  1  0  1
    #> 43  5    3   9  1  1  0
    #> 44  5    3  17  0  1  1
    #> 45  5    3  12  0  1  0