Search code examples
rsubsetvenn-diagram

Creating subsets of the highest 25% of values using them in a Venn Diagram


Here is an example of my space delimited data, which has 796 rows in total:

         Locus     GSL Barents Ireland
1 cgpGmo-S1001 0.25805 0.00339 0.02252
2 cgpGmo-S1006 0.11041 0.04298 0.06036
3 cgpGmo-S1007 0.24085 0.08937 0.03964
4 cgpGmo-S1008 0.07428 0.10824 0.01802
5 cgpGmo-S1009 0.08524 0.01471 0.00000
6 cgpGmo-S1013 0.03547 0.05091 0.00991

what I am seeking to do is to isolate the top quartile (25% for each of the three categories and then draw a Venn Diagram showing the number of loci (rows) whose values are in the top 25% for 1, 2, or all three categories.

I am fairly sure I can use the package venn diagram to create the diagrams, but I am unsure how to generate lists of the loci that fall in the top 25% of each category to use as objects for the venn.


Solution

  • A simple case - sort and get the lowest 25%:

    a <- seq (100,1,-1)
    b <- seq (100,1,-1)
    d <- data.frame(cbind(col1=a, col2=b))
    sort(d$col1)[1:(length(d$col1)/4)]
    

    will sort and give you 25% of the lowest values.

    (or to avoid sorting [could be memory intensive] then use order:

    d$col1[order(d$col1)][1:(length(d$col1)/4)]
    

    )