I would like to estimate the percentage of samples ((0:1)
limits), descripted as Sam
columns, which have exactly the same information in the above and below probes (designed as Abo
and Bel
in dfout
, respectively) AND which samples are the coincident ones between above and below probes (designed as SamsA
and SamsB
in dfout
, respectively).
The input df
:
df <- "Sam1 Sam2 Sam3 Sam4 Sam5
Prb1 0 0 1 2 3
Prb2 0 0 1 2 2
Prb3 0 1 1 2 2
Prb4 2 2 3 2 2"
df <- read.table(text=df, header=T)
The expected output dfout
:
dfout <- "Abo Bel SamsA SamsB
Prb1 NA 0.8 NA Sam1-Sam2-Sam3-Sam4
Prb2 0.8 0.8 Sam1-Sam3-Sam3-Sam4 Sam1-Sam3-Sam4-Sam5
Prb3 0.8 0.4 Sam1-Sam3-Sam4-Sam5 Sam4-Sam5
Prb4 0.4 NA Sam4-Sam5 NA"
dfout <- read.table(text=dfout, header=T)
Any ideas?
This is the approach I would take, using a for()
loop and if statements for clarity (these could be collapsed and vectorized if efficiency is of utmost importance:
df <- "Sam1 Sam2 Sam3 Sam4 Sam5
Prb1 0 0 1 2 3
Prb2 0 0 1 2 2
Prb3 0 1 1 2 2
Prb4 2 2 3 2 2"
df <- read.table(text=df, header=T)
for (i in 1:nrow(df)) {
if (i > 1) {
Sams <- df[i-1,1:5] == df[i,1:5]
df[i,"Abo"] <- sum(Sams)/5
df[i,"SamsA"] <- paste(names(df)[1:5][Sams], collapse="-")
}
if (i < nrow(df)) {
Sams <- df[i+1,1:5] == df[i,1:5]
df[i,"Bel"] <- sum(Sams)/5
df[i,"SamsB"] <- paste(names(df)[1:5][Sams], collapse="-")
}
}
out <- df[,c(8,6,9,7)]
The out
object looks like this:
> out
Abo Bel SamsA SamsB
Prb1 NA 0.8 <NA> Sam1-Sam2-Sam3-Sam4
Prb2 0.8 0.8 Sam1-Sam2-Sam3-Sam4 Sam1-Sam3-Sam4-Sam5
Prb3 0.8 0.4 Sam1-Sam3-Sam4-Sam5 Sam4-Sam5
Prb4 0.4 NA Sam4-Sam5 <NA>