Search code examples
rdataframedplyrreshape

How do I rearrange this data frame in R?


I am sure this has been answered, but I am having difficulty having things make sense in my specific use-case.

I have this table:

> data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
>            Class1 = c("AB", "AB", "BC", "BC", "BC", "CD", "CD", "AB", "BC"),
>            Value1 = c(80, 90, 50, 45, 65, 30, 75, 70, 80),
>            Class2 = c("BC", "BC", "CD", "DQ", "CD", "AB", "AB", NA, NA),
>            Value2 = c(10, 5, 30, 30, 20, 10, 5, NA, NA), 
>            Class3 = c("AF", "DQ", "AB", "CD", "AB", NA, NA, NA, NA),
>            Value3 = c(5, 5, 20, 15, 15, NA, NA, NA, NA), 
>            Class4 = c("DQ", NA, NA, NA, NA, NA, NA, NA, NA),
>            Value4 = c(5, NA, NA, NA, NA, NA, NA, NA, NA))

  ID Class1 Value1 Class2 Value2 Class3 Value3 Class4 Value4
1  1     AB     80     BC     10     AF      5     DQ      5
2  2     AB     90     BC      5     DQ      5   <NA>     NA
3  3     BC     50     CD     30     AB     20   <NA>     NA
4  4     BC     45     DQ     30     CD     15   <NA>     NA
5  5     BC     65     CD     20     AB     15   <NA>     NA
6  6     CD     30     AB     10   <NA>     NA   <NA>     NA
7  7     CD     75     AB      5   <NA>     NA   <NA>     NA
8  8     AB     70   <NA>     NA   <NA>     NA   <NA>     NA
9  9     BC     80   <NA>     NA   <NA>     NA   <NA>     NA

I want to pull out all the class values as rows and their corresponding value associated with the correct ID. I need the final table to look like this:

  Class   ID_1   ID_2   ID_3   ID_4   ID_5   ID_6   ID_7   ID_8   ID_9
1    AB     80     90     20      0     15     10      5     70      0
2    BC     10      5     50     45     65      0      0      0     80
3    CD      0      0     30     15     20     30     75      0      0
4    DQ      5      5      0     30      0      0      0      0      0
5    AF      5      0      0      0      0      0      0      0      0

I have made it as far as transposing the table and creating the correct column titles, but I cannot figure out how to extract the Classes to it's own column? This may be the completely wrong direction, as well...

> data <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
>            Class1 = c("AB", "AB", "BC", "BC", "BC", "CD", "CD", "AB", "BC"),
>            Value1 = c(80, 90, 50, 45, 65, 30, 75, 70, 80),
>            Class2 = c("BC", "BC", "CD", "DQ", "CD", "AB", "AB", NA, NA),
>            Value2 = c(10, 5, 30, 30, 20, 10, 5, NA, NA), 
>            Class3 = c("AF", "DQ", "AB", "CD", "AB", NA, NA, NA, NA),
>            Value3 = c(5, 5, 20, 15, 15, NA, NA, NA, NA), 
>            Class4 = c("DQ", NA, NA, NA, NA, NA, NA, NA, NA),
>            Value4 = c(5, NA, NA, NA, NA, NA, NA, NA, NA))
> data_t <- as.data.frame(t(data), row) 
> colnames(data_t) <- paste0("ID_", unlist(data_t[1, ]))
> data_t <- data_t[-1, ]

> data_t
  ID_1 ID_2 ID_3 ID_4 ID_5 ID_6 ID_7 ID_8 ID_9
2   AB   AB   BC   BC   BC   CD   CD   AB   BC
3   80   90   50   45   65   30   75   70   80
4   BC   BC   CD   DQ   CD   AB   AB <NA> <NA>
5   10    5   30   30   20   10    5 <NA> <NA>
6   AF   DQ   AB   CD   AB <NA> <NA> <NA> <NA>
7    5    5   20   15   15 <NA> <NA> <NA> <NA>
8   DQ <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9    5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>

Solution

  • Not straightforward but we can use the pivot functions of tidyverse to get the job done:

    library(tidyverse)
    
    reshaped_data <- data %>% 
      pivot_longer(
        cols = -ID,
        names_to = c(".value", "group"),
        names_pattern = "(Class|Value)(\\d+)",
        values_drop_na = TRUE
      ) %>%
      group_by(Class, ID) %>%
      summarise(Value = sum(Value, na.rm = TRUE)) %>%
      pivot_wider(names_from = ID, values_from = Value, names_prefix = "ID_") %>%
      replace(is.na(.), 0)
    
    print(reshaped_data)
    

    Which give you the required output:

      Class    ID_1  ID_2  ID_3  ID_5  ID_6  ID_7  ID_8  ID_4  ID_9
    1 AB       80    90    20    15    10     5    70     0     0
    2 AF        5     0     0     0     0     0     0     0     0
    3 BC       10     5    50    65     0     0     0    45    80
    4 CD        0     0    30    20    30    75     0    15     0
    5 DQ        5     5     0     0     0     0     0    30     0