Search code examples
rloopsrecursionapplyexport-to-csv

How to perform looping in R and export data


I have a question, we have generated pilot gene expression data with one sample per condition, this is just a test run. I have a control (baseline) sample followed by 5 different samples and performed analysis using edgeR package in R, I want to indicate my control sample (as baseline) and calculate logFC, logCPM, and PValue for all samples and export the csv file from the object et$table. For instance, Control vs Sample_1, Control vs Sample_2 till Control vs Sample_5 > export csv file. How do I perform looping to export data for all conparisons? We are looking to analyse hundred's of sample and multi-conditions, it will be easier to perform this later on the large datasets.

Thank you,

Toufiq

Input data

dput(Counts_Test)
structure(list(Control = c(0L, 184L, 60L, 0L, 7L, 0L, 87L, 0L, 
0L, 21L, 193L, 29L, 0L, 0L, 3L, 50L, 0L, 325L, 442L), Sample_1 = c(0, 
140.5, 64, 0, 4, 0, 83, 0, 1, 51.5, 199, 25, 0, 0, 5, 62, 0, 
525, 407), Sample_2 = c(0, 169, 45, 1, 3, 0, 122, 0, 0, 36.5, 
179, 20, 0, 0, 1, 58, 0, 494, 570), Sample_3 = c(0L, 107L, 67L, 
0L, 5L, 0L, 99L, 0L, 0L, 63L, 178L, 34L, 0L, 0L, 2L, 60L, 0L, 
467L, 283L), Sample_4 = c(0L, 221L, 44L, 0L, 1L, 0L, 139L, 0L, 
0L, 48L, 222L, 24L, 1L, 0L, 5L, 67L, 0L, 612L, 451L), Sample_5 = c(0, 
120.5, 45, 1, 1, 0, 100, 0, 0, 44.5, 202, 39, 1, 0, 3, 76, 0, 
719, 681)), class = "data.frame", row.names = c("Gene1", "Gene2", 
"Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", "Gene9", 
"Gene10", "Gene11", "Gene12", "Gene13", "Gene14", "Gene15", "Gene16", 
"Gene17", "Gene18", "Gene19"))


dput(Sample_Grouping)
structure(list(SampleID = c("xx-xx-1551", "xx-xx-1548", "xx-xx-1549", 
"xx-xx-1550", "xx-xx-1552", "xx-xx-0093"), ID = c("Control", 
"Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")), class = "data.frame", row.names = c("Control", 
"Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5"))

Example of the R code and input and expected data given below:

library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
et <- exactTest(y, dispersion=bcv^2)
View(et$table)
write.csv(et$table, file="./table_Control_vs_Sample_1", sep = ",")

Export csv file for et$table on comparison between Control vs Sample_1

dput(et$table)
structure(list(logFC = c(2.56274120305193e-15, -0.550254150196693, 
-0.0683012466357368, 2.56274120305193e-15, -0.946184736817423, 
2.56274120305193e-15, -0.229115434595494, 2.56274120305193e-15, 
3.1003487058189, 1.12824320628868, -0.117305916091926, -0.373934941685134, 
2.56274120305193e-15, 2.56274120305193e-15, 0.557345502442179, 
0.148458971210359, 2.56274120305193e-15, 0.530168392608693, -0.280483136036388
), logCPM = c(10.2398977391586, 16.5699363191385, 15.0947320625249, 
10.459922538352, 11.7193421876963, 10.2398977391586, 15.9893772819593, 
10.2398977391586, 10.3561232675437, 14.7852593647762, 16.8836544336529, 
14.184917355539, 10.4590444918373, 10.2398977391586, 11.6104987130447, 
15.2490804549745, 10.2398977391586, 18.2653428429955, 18.1145148472016
), PValue = c(1, 0.523490569569589, 0.978749897705603, 1, 0.6666864407408, 
1, 0.797857807297049, 1, 1, 0.227758918677035, 0.896097547311082, 
0.732358557879292, 1, 1, 0.788137722865551, 0.88329454414985, 
1, 0.532994222767919, 0.743046444999486)), class = "data.frame", row.names = c("Gene1", 
"Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", 
"Gene9", "Gene10", "Gene11", "Gene12", "Gene13", "Gene14", "Gene15", 
"Gene16", "Gene17", "Gene18", "Gene19"))

Expected output

Likewise, it would be great to export csv file for et$table on comparison between Control vs Sample_1, Control vs Sample_2, Control vs Sample_3, Control vs Sample_4, Control vs Sample_5 and add suffix of the sample names to the output csv file.


Solution

  • Is this something like what you want?

    sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
    for(cur_name in sample_names){
      ...
      write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
    }
    

    The ... indicates the lines where you run edgeR for each comparison.

    For each iteration, you can use the pair argument in the exactTest function to specify the comparison. So, your full code may look like this:

    sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
    library(edgeR)
    bcv <- 0.4
    y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
    for(cur_name in sample_names){
      et <- exactTest(y, pair=c("Control", cur_name), dispersion=bcv^2)
      write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
    }
    

    Note that the first element in the pair argument is taken as the baseline for the comparison by exactTest.

    You may reply if this does not solve your problem, or select the answer if it does.

    A further addition responding to your question below, you can enforce gene order this way:

    sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
    library(edgeR)
    bcv <- 0.4
    y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
    for(cur_name in sample_names){
      et <- exactTest(y, pair=c("Control", cur_name), dispersion=bcv^2)
      if(cur_name=="Sample_1"){
        # In the first iteration, capture the order
        geneOrder <- row.names(et$table)
      }else{
        # In the subsequent iterations, enforce the order
        et$table <- et$table[geneOrder,]
      }
      # Now, you can write
      write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
    }
    # The if/else statement will ensure the order is the same for all