Search code examples
rdataframereshapevarying

How do I reshape into long format when I have multiple 'varying' variables? in R


I am working with a dataset in wide format that I would like to transform to a long format for statistical analyses (linear models). However, I am stuck because I have multiple variables that 'change' (for lack of a better word) or are 'varying' variables. Let me try to explain using some mock-data:

mock_data_wide_format

pptns = id of participants
educ = education level

I had two conditions (exp and CTRL)

In those two conditions, I measured an area using 3 different tools (1,2 and 3), and at two different timepoints (before and after) so for participant there was a total of 12 different area calculations

I've also collected other data, e.g. heart rate (HR) in both conditions and at three different timepoints (before, during, after)

Now my question is, how do I reshape a wide data frame that has condition, tool, timepoint_of_area, and timepoint_of_HR, as variables that 'vary' across participants?

I am also struggling to imagine what this long dataset would look like, but I guess it would be like this:

mock_data_long_format

I've tried to reshape one variable at a time, so that I can reshape the wide dataset into a long one step-by-step. E.g. first reshape Condition like this:

mock_data_reshape_condition

and then for example tool.

However, I don't know how to code that in R. This was my attempt:

long_1 <- reshape(wide, direction = 'long',
varying=c('area_exp_1_before', 'area_exp_2_before', 'area_exp_3_before',
'area_exp_1_after', 'area_exp_2_after', 'area_exp_3_after',
'area_CTRL_1_before', 'area_CTRL_2_before', 'area_3_Brush_before',
'area_CTRL_1_after', 'area_CTRL_2_after', 'area_3_Brush_after'),
timevar='Condition',
times=c('exp', 'CTRL'),
v.names=c('1_before', '2_before', '3_before', '1_after', '2_after', '3_after'),
idvar = "pptns",
ids = 1:nrow(all_data)

but this returns an error message.

I know that there are at least three Stack Overflow topics that are similar, that recommend the use of e.g. patterns or names.to. However, when I try to use these functions, I get lost in how to transform some variables, while leaving others.

I'm sure I'm doing something wrong using reshape but I don't know what :`)

Any help would be tremendously appreciated. Thank you a lot in advance!


Solution

  • While no data is provided I created some example data, which is probably not exactly the same dataset as yours, but I hope it will help you understand.

    It is easier for me to create example data in long format and then turn it into wide, then create wide format at first (maybe also easier to collect data in long format?).

    #Create example data
    library(tidyr)
    set.seed(123) #So the df delivers the same HR and age values every full run
    
    #n = number of participants
    n = 2
    
    l <- list(
      pptn=1:n,
      condition=c("exp", "CTRL"),
      time_condition=c("before", "after"),
      tool=c(1,2,3)
    )
    #Expand this list with extra measurement vars if you want
    measurement_vars <- names(l)[-1] #-1 to exclude pptn
    
    #Create data.frame with every combination possible
    df <- expand.grid(l, stringsAsFactors = F)
    names(df) <- names(l)
    
    #HR differs per combination (thus for every record in df)
    df$HR <- sample(40:120, nrow(df)) #random data
    
    #Create wide format
    df_wide <- pivot_wider(df, names_from=measurement_vars, values_from = "HR")
    #Add an age for every pptn (thus for every record of df_wide)
    df_wide$age <- sample(18:65, nrow(df_wide)) #random data
    #Add height and weight in the same way if you want
    

    And now that we have our df_wide example data you can change it (back) to long.

    df_long <- pivot_longer(df_wide, cols=-c("pptn", "age"), names_sep = "_", names_to=measurement_vars, values_to = "HR")
    

    Measurement_vars is made flexible, but if you want to add for instance height to your df_wide in the same way as age, you need to add it to the cols argument in the pivot_longer function.

    I Hope this helps you figure things out.