Search code examples
rrowsprobabilityestimation

Getting coincident information between rows


I would like to estimate the percentage of samples ((0:1) limits), descripted as Sam columns, which have exactly the same information in the above and below probes (designed as Aboand Bel in dfout, respectively) AND which samples are the coincident ones between above and below probes (designed as SamsAand SamsB in dfout, respectively).

The input df:

     df <-  "Sam1  Sam2 Sam3 Sam4 Sam5 
Prb1  0       0    1    2    3    
Prb2  0       0    1    2    2    
Prb3  0       1    1    2    2    
Prb4  2       2    3    2    2" 

df <- read.table(text=df, header=T)

The expected output dfout:

dfout <-  "Abo Bel SamsA SamsB
        Prb1   NA  0.8  NA   Sam1-Sam2-Sam3-Sam4
        Prb2  0.8 0.8  Sam1-Sam3-Sam3-Sam4 Sam1-Sam3-Sam4-Sam5
        Prb3  0.8 0.4  Sam1-Sam3-Sam4-Sam5 Sam4-Sam5
        Prb4  0.4 NA Sam4-Sam5 NA"

    dfout <- read.table(text=dfout, header=T)

Any ideas?


Solution

  • This is the approach I would take, using a for() loop and if statements for clarity (these could be collapsed and vectorized if efficiency is of utmost importance:

    df <-  "Sam1  Sam2 Sam3 Sam4 Sam5 
    Prb1  0       0    1    2    3    
    Prb2  0       0    1    2    2    
    Prb3  0       1    1    2    2    
    Prb4  2       2    3    2    2" 
    
    df <- read.table(text=df, header=T)
    
    
    for (i in 1:nrow(df)) {
      if (i > 1) {
        Sams <- df[i-1,1:5] == df[i,1:5]
        df[i,"Abo"] <- sum(Sams)/5
        df[i,"SamsA"] <- paste(names(df)[1:5][Sams], collapse="-")
      }
      if (i < nrow(df)) {
        Sams <- df[i+1,1:5] == df[i,1:5]
        df[i,"Bel"] <- sum(Sams)/5
        df[i,"SamsB"] <- paste(names(df)[1:5][Sams], collapse="-")
      }
    }
    
    out <- df[,c(8,6,9,7)]
    

    The out object looks like this:

    > out
         Abo Bel               SamsA               SamsB
    Prb1  NA 0.8                <NA> Sam1-Sam2-Sam3-Sam4
    Prb2 0.8 0.8 Sam1-Sam2-Sam3-Sam4 Sam1-Sam3-Sam4-Sam5
    Prb3 0.8 0.4 Sam1-Sam3-Sam4-Sam5           Sam4-Sam5
    Prb4 0.4  NA           Sam4-Sam5                <NA>