Search code examples
rdataframereshapelong-format-datawide-format-data

R: Wide to long format for dataframe with one single row and >250 columns


I am trying to get my dataframe in a long format. Currently, it's a dataframe with 1 obs. and >250 variables which looks something like this:

user_0_id user_0_name user_0_sex user_1_id user_1_name user_1_sex user_2_id user_2_name user_2_sex
16 miller female 52 smith male 33 frank female

Meanwhile, I just wish to create a dataframe which looks like this:

id name sex
16 miller female
52 smith male
33 frank female

I've tried renaming the column names in the df to match by applying the following code

names(df) <- sub('^users_', '', names(df))
names(df) <- sub('^[0-9.]+', '', names(df))
names(df) <- sub('^_', '', names(df))

Afterwards, I've been trying to switch it to the long format using melt, which only ended up either multiplying the obs. or completely mumbling up the dataframe.

df <- melt(setDT(df), id.vars = c("id","name"), variable.name = "data")

df <- melt(df, id.vars=c("id", "name", "sex"))

This is the first time I've asked a question on stackoverflow, so any advice as to how to make future questions more comprehensible are very welcome! Thanks to anyone that took their time to read this.


Solution

  • library(tidyr)
    
    pivot_longer(
      df,
      cols = everything(),
      names_to = c(".value"),
      names_pattern = "user_\\d+_(.*)"
    )
    
    # A tibble: 3 × 3
         id name   sex   
      <int> <chr>  <chr> 
    1    16 miller female
    2    52 smith  male  
    3    33 frank  female
    

    Where df is:

    df <- read.table(text = 
    'user_0_id  user_0_name user_0_sex  user_1_id   user_1_name user_1_sex  user_2_id   user_2_name user_2_sex
    16  miller  female  52  smith   male    33  frank   female
    ', header = TRUE)