Search code examples
rdataframedata.tabletopmost

R - Find three top values based on other column


I have a data frame (but could be data table too) with the fraction of vegetation types (first column "type") in multiple sites (remaining columns "a" to "l"). This is what it looks like:

dat <- structure(list(type = structure(1:17, class = "factor", .Label = c("not_vegetated", 
"needleleaf_evergreen_temperate_tree", "needleleaf_evergreen_boreal_tree", 
"needleleaf_deciduous_boreal_tree", "broadleaf_evergreen_tropical_tree", 
"broadleaf_evergreen_temperate_tree", "broadleaf_deciduous_tropical_tree", 
"broadleaf_deciduous_temperate_tree", "broadleaf_deciduous_boreal_tree", 
"broadleaf_evergreen_shrub", "broadleaf_deciduous_temperate_shrub", 
"broadleaf_deciduous_boreal_shrub", "c3_arctic_grass", "c3_non.arctic_grass", 
"c4_grass", "c3_crop", "c3_irrigated")), a = c("0", "1.55", "0", 
"0", "0", "0", "0", "11.59", "0", "0", "0", "0", "0", "31.5", 
"0.26", "52.29", "0"), b = c("0", "8.27", "0.02", "0", "0", "0", 
"0", "35.5", "0.05", "0", "0.04", "0", "0", "26.02", "0", "15.7", 
"0"), c = c("1.42", "7.55", "0", "0", "0", "0", "0", "14.24", 
"2.38", "0", "0.06", "0", "0", "31.79", "0", "36.56", "0"), d = c("0", 
"13.87", "3.97", "0", "0", "0", "0", "51.66", "7.68", "0", "0.07", 
"0", "0", "8.18", "0", "10.23", "0"), e = c("0", "16.23", "0.24", 
"0", "0", "0", "0", "67.15", "2.12", "0", "0", "0", "0", "6.52", 
"0.1", "4.68", "0"), f = c("0.26", "6.29", "60.98", "0", "0", 
"0", "0", "3.72", "10.49", "0", "0", "0.3", "2.45", "3.61", "0", 
"3.9", "0"), g = c("0.11", "38.16", "10.9", "0", "0", "0", "0", 
"31.72", "13.53", "0", "0", "0", "0", "0.68", "0", "0.9", "0"
), h = c("0", "10.42", "0.42", "0", "0", "0", "0", "55.44", "1.49", 
"0", "0", "0", "0", "16.54", "0", "13.33", "0"), i = c("0", "1.39", 
"0", "0", "0", "0", "0", "11.56", "0", "0", "0", "0", "0", "33.94", 
"0", "49.26", "0"), j = c("0.45", "16.48", "0", "0", "0", "0", 
"0", "40.02", "0", "0", "0.02", "0", "0", "17.53", "0", "10.51", 
"0"), k = c("6.02", "2.78", "0", "0", "0", "0", "0", "1.27", 
"2.51", "0", "0.03", "0", "0", "18.29", "0", "69.1", "0"), l = c("6.13", 
"22.77", "2.72", "0", "0", "0", "0", "22.69", "3.85", "0", "0.06", 
"0", "0", "1.25", "0", "1.53", "0")), .Names = c("type", 
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"), row.names = c(NA, 
-17L), class = "data.frame")

What I need to do is calculate the three more abundant vegetation type fractions per site, and summarize it in a data frame (or data table) similar to this (showing only the first two sites for the sake of brevity):

            a                                                   b
c3_crops (52.29)                                broadleaf_deciduous_temperate_tree (35.5)
c3_non.arctic_grass (31.5)                      c3_non.arctic_grass (26.02)
broadleaf_deciduous_temperate_tree (11.59)      c3_crop (15.7)

Any tips on how to do that?


Solution

  • Here is one way for you. If I am reading your question right, you want to take three sites with three highest numbers. You'd start by formatting data in long format, convert character to numeric, and sort the data by site and value. Then, you define groups by site and take the first three rows for each group.

    library(dplyr)
    library(tidyr)
    
    gather(dat, key = site, value = value, -type) %>%
    mutate(value = as.numeric(value)) %>%
    arrange(site, desc(value)) %>%
    group_by(site) %>%
    top_n(3)
    
       type                               site  value
       <fct>                              <chr> <dbl>
     1 c3_crop                            a      52.3
     2 c3_non.arctic_grass                a      31.5
     3 broadleaf_deciduous_temperate_tree a      11.6
     4 broadleaf_deciduous_temperate_tree b      35.5
     5 c3_non.arctic_grass                b      26.0
     6 c3_crop                            b      15.7
     7 c3_crop                            c      36.6
     8 c3_non.arctic_grass                c      31.8
     9 broadleaf_deciduous_temperate_tree c      14.2
    10 broadleaf_deciduous_temperate_tree d      51.7
    # ... with 26 more rows