Search code examples
rdata-wranglingtime-measurement

R: How to recode a wide data frame with repeating variables (time) into to long data?


My data frame consists of several hundred persons (rows) answering 4 * 90 items each. It’s the same 90 questions across four points in time per person. Each person belongs to one of two groups. I checked for measurement invariance across these groups, this is easy with multigroup SEM and data in the wide format. Here is a shortened, reproducible example of my initial data frame:

df <- data.frame (ID = c(1,2,3,4,5),
                  Item1_time1 = c(4,4,5,4,3),
                  Item2_time1 = c(3,4,3,5,3),
                  Item1_time2 = c(5,4,4,5,4),
                  Item2_time2 = c(3,3,4,3,5),
                  group = c(0,1,0,1,0)
)

print(df)
  ID Item1_time1 Item2_time1 Item1_time2 Item2_time2 group
1  1           4           3           5           3     0
2  2           4           4           4           3     1
3  3           5           3           4           4     0
4  4           4           5           5           3     1
5  5           3           3           4           5     0

However, now I want to check for measurement invariance across points in time. To do this, this is what the desired output should look like:

ID  time   Item1   Item2    group
1    1       4       3        0
1    2       5       3        0
2    1       4       4        1
2    2       4       3        1
3    1       5       3        0
3    2       4       4        0
4    1       4       5        1
4    2       5       3        1
5    1       3       3        0
5    2       4       5        0

In order to get this, I tried to convert my initial wide to long data, but I don’t really get the desired output:

library(data.table)
long <- melt(setDT(df), id.vars = c("ID"), variable.name = "time")
print(long)

    ID        time value
 1:  1 Item1_time1     4
 2:  2 Item1_time1     4
 3:  3 Item1_time1     5
 4:  4 Item1_time1     4
 5:  5 Item1_time1     3
 6:  1 Item2_time1     3
 7:  2 Item2_time1     4
 8:  3 Item2_time1     3
 9:  4 Item2_time1     5
10:  5 Item2_time1     3
11:  1 Item1_time2     5
12:  2 Item1_time2     4
13:  3 Item1_time2     4
...

How can I code “collapsed” items so that they are not listed separately, but each in its corresponding time category (see desired output above)?


Solution

  • We can use pivot_longer with names_pattern to capture the substring ((...)) from the column names to create the .value column and the 'time' column

    library(tidyr)
    pivot_longer(df, cols = -c(ID, group), names_to = c(".value", "time") , 
          names_pattern = "(\\w+)_\\D+(\\d+)")
    

    -output

    # A tibble: 10 × 5
          ID group time  Item1 Item2
       <dbl> <dbl> <chr> <dbl> <dbl>
     1     1     0 1         4     3
     2     1     0 2         5     3
     3     2     1 1         4     4
     4     2     1 2         4     3
     5     3     0 1         5     3
     6     3     0 2         4     4
     7     4     1 1         4     5
     8     4     1 2         5     3
     9     5     0 1         3     3
    10     5     0 2         4     5
    

    The (\\w+) - captures one or more characters followed by the _ in the column name, and then any non-digits (\\D+) followed by the second capture group of one or more digits ((\\d+)), which corresponds to the .value (column values) and 'time' gets the digits suffix from the column name