Search code examples
rdataframeloopsgrouping

Grouping a mass spectrometry data in R by loop


I want to make group of every 3 columns in different group in the data frame. I have attached the example data frame image. It is having 12 columns.

Example data frame 1

Based on the pattern I have to group the columns. For 6 column data frame, the user can give the pattern like (0,0,0,1,1,1) and I splitted the data frame by following way.

data = as.matrix(input_data)
  
    id0 = which(class == 0)
    id1 = which(class == 1)
  
    data0 <- data[,id0]
    data1 <- data[,id1]

By this way, I can do for 12 column data frame also.

But the data frame is not a fixed one. It is a mass spectrometry expression dataset. It may have any number of columns, but all are triplicate data means they can divided by 3.

class = c(0,0,0,1,1,1)
length(class)
x <- length(class)/3 
x

The above code I have tried for finding out the required number of groups.

So, kindly provide any loop operation for this problem. Because the dataframe may be change for every experiment.


Solution

  • Let's write this script with a dummy data

    df<-data.frame(rep(data.frame(n1=c(0:6),n2=c(0:6)),3))
    
    stack_scipt1<-function(df,replicate_value){   #df= dataframe, replicate value= no, of replicates input by user
    df<-read.csv(file.choose(),header = TRUE)       #loads csv file into R as dataframe from pc folder where it is stored
    df_col<-ncol(df)                                #calculates no. of columns in dataframe
    groups<-sort(rep(0:((df_col/replicate_value)-1),replicate_value)) #creates user determined groups
    id<-list()                                      #creates empty list
    for (i in 1:length(unique(groups))){
    id[[i]]<-which(groups == unique(groups)[i])}  #creates list of groups
    names(id)<-paste0("id",unique(groups))          #assigns group based names to the list "id"
    data<-list()                                    #creates empty list
    for (i in 1:length(id)){
    data[[i]]<-df[,id[[i]]]}                      #creates list of dataframe columns sorted by groups
    names(data)<-paste0("data",unique(groups))      #assigns group based names to the list "data"
    return(data)}
    

    Executing the script:

    >stack_script1(df,3)
    

    Note:

    1. This script will work for dataframes of any column number as long as column number is a multiple of replicate value
    2. The line df<-read.csv(file.choose(),header = TRUE) is not compulsory if you have imported the data into R previously. I prefer to use it because it allows me to import data from save folder without needing to import it additionally
    3. The output is a list containing grouped data frames. It's not possible to return multiple dataframe vectors from a script