Search code examples
rdataframepivottidyr

How to pivot specific columns in R?


I'm fed up of trying, so brownie points for anyone who helps me. I've got the following data:

df <- structure(list(type = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k"), female = c(NA, 3, NA, 1, NA, 2, 2, 1, 16, 43, 
10), prop_total_female = c(NA, 0.04, NA, 0.01, NA, 0.03, 0.03, 
0.01, 0.21, 0.55, 0.13), male = c(2, NA, 2, NA, 1, 8, 1, NA, 
11, 34, 12), prop_total_male = c(0.03, NA, 0.03, NA, 0.01, 0.11, 
0.01, NA, 0.15, 0.48, 0.17)), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame"))

type  female   prop_total_female    male    prop_total_male
   a      NA                  NA       2               0.03
   b       3                0.04      NA                 NA   
   c      NA                  NA       2               0.03
   d       1                0.01      NA                 NA   
   e      NA                  NA       1               0.01
   f       2                0.03       8               0.11
   g       2                0.03       1               0.01
   h       1                0.01      NA                 NA   
   i      16                0.21      11               0.15
   j      43                0.55      34               0.48
   k      10                0.13      12               0.17

I want to pivot the data to get one column called gender, from male and female respectively, and another column called prop from prop_total_female and prop_total_male. So the desired dataframe will have three columns: type, gender, and prop. A tidyverse solution is preferred.

And to @user438383 who will inevitably mark my question as a duplicate, at least give me a solution before resuming your idle life.


Solution

  • You can use names_pattern and the corresponding .value in names_to when you have dynamic column names you want to pivot. Because your name pattern is slightly peculiar (it has only one group for the male/female columns, i.e. there is no _ and prefix), the pattern will add some NAs that you can remove.

    If you wish to keep the NAs that were in the original df, you can use filter(!is.na(gender)) instead.

    library(dplyr)
    library(tidyr)
    df %>% 
      pivot_longer(-type, names_pattern = "(prop_total)_(.*)", 
                   names_to = c(".value", "gender")) %>% 
      filter(!is.na(prop_total))
    
    # # A tibble: 16 × 3
    #    type  gender prop_total
    #    <chr> <chr>       <dbl>
    #  1 a     male         0.03
    #  2 b     female       0.04
    #  3 c     male         0.03
    #  4 d     female       0.01
    #  5 e     male         0.01
    #  6 f     female       0.03
    #  7 f     male         0.11
    #  8 g     female       0.03
    #  9 g     male         0.01
    # 10 h     female       0.01
    # 11 i     female       0.21
    # 12 i     male         0.15
    # 13 j     female       0.55
    # 14 j     male         0.48
    # 15 k     female       0.13
    # 16 k     male         0.17