I have a tab delimited file abc.txt
contig score guide
1:100-101 7 AAA
1:100-101 6 BBB
1:100-101 5 CCC
1:100-101 4 DDD
1:100-101 3 EEE
1:100-101 2 FFF
1:100-101 1 GGG
1:100-101 90 HHH
1:100-101 111 III
1:100-101 1111 JJJ
1:200-203 503.5333333 KKK
1:200-203 570.7212121 LLL
1:200-203 637.9090909 MMM
1:200-203 705.0969697 NNN
1:200-203 772.2848485 OOO
1:200-203 839.4727273 PPP
1:200-203 906.6606061 QQQ
1:200-203 973.8484848 RRR
2:300-301 1041.036364 SSS
2:300-301 1108.224242 TTT
2:300-301 1175.412121 UUU
2:300-301 1242.6 VVV
2:300-301 1309.787879 ABC
2:300-301 1376.975758 CGA
2:300-301 1444.163636 ACD
Column 1-Contig has multiple repeat values, column two has scores and column three has guide letters corresponding to column-2 scores. I need to select top 5 scores for the similar values in column one (contig) and print there corresponding column 3 values.
Output should look like this, with first column having the unique column 1-Contig entry and next 10 rows for the top 5 scores and corresponding column-3 guide letters
Score-1 Guide-1 Score-2 Guide-2 Score-3 Guide-3 Score-4 Guide-4 Score-5 Guide-5
1:100-101 1111 JJJ 111 III 90 HHH 7 AAA 6 BBB
1:200-203 973.8484848 RRR 906.6606061 QQQ 839.4727273 PPP 772.2848485 OOO 705.0969697 NNN
2:300-301 1444.163636 ACD 1376.975758 CGA 1309.787879 ABC 1242.6 VVV 1175.412121 UUU
I used "dplyr" and "desctools" packages, however I am running with some error.
library(dplyr)
library(DescTools)
file <- "abc.txt"
x=read.table(file)
b <- Large(x, k=5, unique = FALSE, na.last=NA)
and getting this error
Error in Large(x, k = 5, unique = FALSE, na.last = NA) :
Not compatible with requested type: [type=character; target=double].
I was manged to do this in excel using 'sumproduct, large, iferror and vllokup' formulas, however for large datasets I want to extract file using R.
Any help will be much appreciated
The problem is large expects a numeric vector, not an entire dataframe. This is just a guess since I dont have a reproducible example, but you might want to do something along these lines:
library(dplyr)
library(DescTools)
file <- "./abc.txt"
x=read.table(file)
colnames(x)<-c("contig","score","guide")
x<-x[-1,]
list <- split(x , f = x$contig )
columntitles<-c()
for (i in 1:5)
columntitles<-c(columntitles,paste0("guide-",i),paste0("score-",i))
x = data.frame(matrix(NA, nrow = 1, ncol = 10))
colnames(x)<-columntitles
for (i in 1:3){
singlerow<-c()
partialdata<-list[[i]]
partialdata<-partialdata%>% top_n(5, score)
partialdata<-partialdata[Rev(order(partialdata$score)),]
for (j in 1:5){
singlerow<-c(singlerow,toString(partialdata$guide[j]),toString(partialdata$score[j]))
}
x<-rbind(x,singlerow)
}
x<-x[-1,]