Search code examples
rtidyr

Convert data Frame from long to wide in R


I would like to transform a long data frame (see example) below to a wide format.

My data structure looks something like this

| Month    | ID             |Tx    | F |
| -------- | -------------- |------|---|
| 12       | 1              | -0.1 |1|
| 12       | 1              |  -0.1|2|
| 24       | 1              | 0.1  |3|
| 60       | 1              |  -0.45|4|
| 12       | 2              | -0.34 |1|
| 12       | 2              |  0.123|2|
| 24       | 2              | 0.15 |3|
| 12       | 3              |  0.4|1|
| 12       | 3              | -0.153|2|
| 24       | 3              |  -0.5|3|

My code looks like this

pivot_wider(
  df.supine, 
  names_from = df.supine$Month,
  names_prefix="Month_", 
  values_from = df.supine$Tx
)

I get the following error:

Error in `pivot_wider()`:
! Can't subset columns past the end.
ℹ Locations 60, 96, 60, …, 60, and 96 don't exist.
ℹ There are only 30 columns.
Backtrace:
 1. tidyr::pivot_wider(...)
 2. tidyr:::pivot_wider.data.frame(...)

How can I fix this error?

My guess is, that the problem is, that not all IDs have all Months.


Solution

  • How can I fix this error?

    You only need the column names for names_from and values_from. (@Carl)

    It's because pivot_wider is a tidyverse::dplyr function who requires tidy-select arguments. It's a worth reading.

    This will most likely throw an error:

    #> pivot_wider(
    #    df.supine, 
         names_from = df.supine$Month,
    #    names_prefix="Month_", 
         values_from = df.supine$Tx
    #  )
    
    Error in `pivot_wider()`:
    ! Can't select columns past the end.
    ℹ Locations 12, 12, 24, …, 12, and 12 don't exist.
    ℹ There are only 4 columns.
    

    But this works just fine even without prefixes:

    > pivot_wider(
        df.supine, 
        names_from  = Month,
        values_from = Tx
      )
    
    # A tibble: 9 × 5
         ID     F   `12`  `24`  `60`
      <dbl> <dbl>  <dbl> <dbl> <dbl>
    1     1     1 -0.1   NA    NA   
    2     1     2 -0.1   NA    NA   
    3     1     3 NA      0.1  NA   
    # ℹ 6 more rows
    # ℹ Use `print(n = ...)` to see more rows  
    

    names looks for column names. If you pass supine.df$Month, it will look for columns 12, 12, 24 and so on, not Month!

    If you want to use am external vector as argument, you need a selection-helper like any_of or all_of:

    my_col <- "Month"
    
    > pivot_wider(
        df.supine, 
        names_from  = any_of(my_col),
        values_from = Tx
      )
    
    # A tibble: 9 × 5
         ID     F   `12`  `24`  `60`
      <dbl> <dbl>  <dbl> <dbl> <dbl>
    1     1     1 -0.1   NA    NA   
    2     1     2 -0.1   NA    NA   
    3     1     3 NA      0.1  NA   
    # ℹ 6 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    Not having all combinations for ID-Month isn't a problem. You can fill the missing values with values_fill:

    > pivot_wider(
        df.supine, 
        names_from  = any_of(my_col),
        values_from = Tx,
        values_fill = 0
      )
    
    # A tibble: 9 × 5
         ID     F   `12`  `24`  `60`
      <dbl> <dbl>  <dbl> <dbl> <dbl>
    1     1     1 -0.1    0     0   
    2     1     2 -0.1    0     0   
    3     1     3  0      0.1   0  
    # ℹ 6 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    Check the documentation and you'll be fine.