Search code examples
rtransposemelt

Transposing a dataframe and using the first column as an index


I have a dataframe that is structured like below, where A/B/C/D are different treatment methods:

input <- read.table(text="
filename wavelength  A    B    C    D
file1      w1        NA   NA   1   2
file1      w2        NA   NA   3   2
file1      w3        NA   NA   6   2
file2      w1        3    4    NA   NA
file2      w2        4    8    NA   NA
file2      w3        6    1    NA   NA", header=TRUE)

And I would like for it to be transposed so that wavelength is the header and treatments are rows with the filenames duplicated each time:

desired <- read.table(text="
filename Method   w1    w2   w3 
 file1      C     1    3     6
 file1      D     2    2     2
 file2      A     3    4     6 
 file2      B     4    8     1", header=TRUE)

I've tried melt/cast from reshape2, melt from the data.table package, gather/spread, t - everything I can think of. The actual data frame in the end will be about 500 rows by 3500 columns - so I would prefer not to call out any specific column or method names. My issue seems mainly to be that I can't call all method columns under one value and use it to melt:

colMethods <- myData[, 2:length(myData)]

A lot of times I don't get an error, but the dataframe R returns is just a list of wavelengths and a column that says 'wavelength'. How would any of you approach this? Thanks!


Solution

  • You can try this:

    library(tidyverse)
    #Data
    df <- structure(list(filename = c("file1", "file1", "file1", "file2", 
    "file2", "file2"), wavelength = c("w1", "w2", "w3", "w1", "w2", 
    "w3"), A = c(NA, NA, NA, 3L, 4L, 6L), B = c(NA, NA, NA, 4L, 8L, 
    1L), C = c(1L, 3L, 6L, NA, NA, NA), D = c(2L, 2L, 2L, NA, NA, 
    NA)), class = "data.frame", row.names = c(NA, -6L))
    

    Code:

    df %>% pivot_longer(cols = -c(1,2)) %>% filter(!is.na(value)) %>%
      pivot_wider(names_from = wavelength,values_from = value)
    

    Output:

    # A tibble: 4 x 5
      filename name     w1    w2    w3
      <chr>    <chr> <int> <int> <int>
    1 file1    C         1     3     6
    2 file1    D         2     2     2
    3 file2    A         3     4     6
    4 file2    B         4     8     1