Here is an example of my space delimited data, which has 796 rows in total:
Locus GSL Barents Ireland
1 cgpGmo-S1001 0.25805 0.00339 0.02252
2 cgpGmo-S1006 0.11041 0.04298 0.06036
3 cgpGmo-S1007 0.24085 0.08937 0.03964
4 cgpGmo-S1008 0.07428 0.10824 0.01802
5 cgpGmo-S1009 0.08524 0.01471 0.00000
6 cgpGmo-S1013 0.03547 0.05091 0.00991
what I am seeking to do is to isolate the top quartile (25% for each of the three categories and then draw a Venn Diagram showing the number of loci (rows) whose values are in the top 25% for 1, 2, or all three categories.
I am fairly sure I can use the package venn diagram to create the diagrams, but I am unsure how to generate lists of the loci that fall in the top 25% of each category to use as objects for the venn.
A simple case - sort and get the lowest 25%:
a <- seq (100,1,-1)
b <- seq (100,1,-1)
d <- data.frame(cbind(col1=a, col2=b))
sort(d$col1)[1:(length(d$col1)/4)]
will sort and give you 25% of the lowest values.
(or to avoid sorting [could be memory intensive] then use order
:
d$col1[order(d$col1)][1:(length(d$col1)/4)]
)