Search code examples
rdataframesplituniquekey-value

How to extract all columns of a data frame which contain a specific value in the first row in R


I have a data frame with measurements for multiple experiments. I would like to separate the data frame into multiple data frames each containing measurements from one experiment.

g1,g2,etc. are different molecules that i measure and i would like to keep this information as row names.

df1

a1 a2 a3 a4 a5 a6
g1 0 5 7 2 1 9
g2 5 6 1 3 7 1

I have then another data frame with information on which experiment belongs to which measurement.

df2

MeasurmentNr ExperimentNr
a1 e1
a2 e1
a3 e2
a4 e2
a5 e3
a6 e3

How to separate the data frame into multiple data frames with the same format?

I tried to do to it by matching the values, but that didn't work.

exp <- split(dataframe,factor(c("e1","e2","e3"), levels=c("e1","e2","e3")))

What i ended up doing is not an elegant solution and it also has many steps while i am sure it can be done in one step. I'm not good in R so i would be really happy if someone could explain to be a more elegant solution.

Here is my solution. I first reshaped the df2, than merged it with df1 and lastly split the columns to a new list where each experiment is a separate df.

test <- melt(as.data.frame(df2), id.vars = "MeasurmentNr")
colnames(test) <- test[1,]
test <- test[-1, ]
newnew <- rbind(test,df1)

new_df <- split.default(newnew, sub('\\+', '', newnew[1,]))

Solution

  • It creates a separate data frame for each experiment.

    text="a1 a2 a3 a4 a5 a6
    g1 0 5 7 2 1 9
    g2 5 6 1 3 7 1"
    df1=read.table(text=text, header=TRUE, stringsAsFactors = FALSE, row.names=1)
    text="MeasurmentNr ExperimentNr
    a1 e1
    a2 e1
    a3 e2
    a4 e2
    a5 e3
    a6 e3"
    df2=read.table(text=text, header=TRUE, stringsAsFactors = FALSE)
    
    library(dplyr)
    lapply(unique(df2$ExperimentNr), function(x) {
      filter(df2, ExperimentNr==x) %>% .$MeasurmentNr %>% df1[,.]
    })
    
    [[1]]
       a1 a2
    g1  0  5
    g2  5  6
    
    [[2]]
       a3 a4
    g1  7  2
    g2  1  3
    
    [[3]]
       a5 a6
    g1  1  9
    g2  7  1