Search code examples
rdataframemultiple-columnsreshape2columnsorting

Stack multiple columns in one based on column name


My data frame consists of 53 observations of 269 variables. The structure is similar to this:

   id            w.0 v.0 a.0         w.1 v.1 a.1            
1   1           here   7   5         wor   1   7         
4   4            are   6   8          ds   5   4        
7   7           some   7   2         hey   3   8           

The column names go up to w.26, v.26, a.26.

My task is to make three main columns: words, va, ac and, therefore, a long format data frame, similar to this:

   id          words  va  ac                     
1   1           here   7   5                  
2   2            are   6   8                  
3   3           some   7   2           
4   4            wor   1   7
5   5             ds   5   4
6   6            hey   3   8 

I used this code:

df_reshaped <- reshape(subset(df), 
              varying=Map(function(x) paste(c("w","v", "a"), x, sep="."), 
                          c("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14",
                            "14","15","16","17","18","19","20","21","22","23","24","25","26")),
              v.names=c("words","va","ac"),
              idvar= "id",    
              direction="long")

However, it stacked columns incorrectly (the structure is similar to this):

     id time  words   va   ac
1.1   1    1   here  are  some
4.1   4    1    wor   ds   hey
7.1   7    2      7    6     7
8.1   8    2      1    5     3
10.1 10    3      5    8     2
11.1 11    3      7    4     8

Any help would be appreciated. Thank you.


Solution

  • We can use pivot_longer

    library(dplyr)
    library(tidyr)
    df1 %>% 
      pivot_longer(cols = -id, names_to = c('.value', 'group'), names_sep="\\.")%>% 
      select(-group) %>% 
      rename_at(-1, ~ c('words', 'va', 'ac'))
    

    data

    df1 <- structure(list(id = c(1L, 4L, 7L), w.0 = c("here", "are", "some"
    ), v.0 = c(7L, 6L, 7L), a.0 = c(5L, 8L, 2L), w.1 = c("wor", "ds", 
    "hey"), v.1 = c(1L, 5L, 3L), a.1 = c(7L, 4L, 8L)), class = "data.frame",
    row.names = c("1", 
    "4", "7"))