Search code examples
rdataframedplyrplyr

Sorting in R with dplyr: How to sort by category in one column based on sum of category in another column?


I have an example data frame below. I need to have it sorted by Type, Species, and BdFt. The example below is nearly correct, however I don't want species sorted in alphabetical order. I would like to sort species based on the sum of each species (within each "type") in descending order. So for example, within type 4404, 'DF' should appear first. I also would like to maintain the current number of observations, so I don't want to consolidate by species group. Could anyone help me achieve this, perhaps with dplyr?

-Brandon

   Type Species  BdFt
   4404      BB   164
   4404      BB    55
   4404      BM   831
   4404      BM   419
   4404      BM   242
   4404      BM    20
   4404      CH   565
   4404      CH   206
   4404      CH    88
   4404      CO  1817
   4404      CO   531
   4404      CO   286
   4404      CO    31
   4404      DF 19740
   4404      DF  5930
   4404      DF   613
   4404      DF   468
   4404      DF   167
   4404      GF   360
   4404      GF   232
   4404      GF   124
   4410      BM   909
   4410      CH   161
   4410      DF 18756
   4410      GF  3642
   4410      RA   549

Solution

  • Here is one option with arrange

    library(dplyr)
    df2 <- df1 %>% 
           arrange(Type, desc(ave(BdFt, Species, Type,  FUN = sum))) 
    
    df2
    #   Type Species  BdFt
    #1  4404      DF 19740
    #2  4404      DF  5930
    #3  4404      DF   613
    #4  4404      DF   468
    #5  4404      DF   167
    #6  4404      CO  1817
    #7  4404      CO   531
    #8  4404      CO   286
    #9  4404      CO    31
    #10 4404      BM   831
    #11 4404      BM   419
    #12 4404      BM   242
    #13 4404      BM    20
    #14 4404      CH   565
    #15 4404      CH   206
    #16 4404      CH    88
    #17 4404      GF   360
    #18 4404      GF   232
    #19 4404      GF   124
    #20 4404      BB   164
    #21 4404      BB    55
    #22 4410      DF 18756
    #23 4410      GF  3642
    #24 4410      BM   909
    #25 4410      RA   549
    #26 4410      CH   161
    

    Or with order from base R

    df1[with(df1, order(Type, -ave(BdFt, Species, Type,  FUN = sum))),]
    

    data

    df1 <- structure(list(Type = c(4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 
    4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 
    4404L, 4404L, 4404L, 4404L, 4404L, 4404L, 4410L, 4410L, 4410L, 
    4410L, 4410L), Species = c("BB", "BB", "BM", "BM", "BM", "BM", 
    "CH", "CH", "CH", "CO", "CO", "CO", "CO", "DF", "DF", "DF", "DF", 
    "DF", "GF", "GF", "GF", "BM", "CH", "DF", "GF", "RA"), BdFt = c(164L, 
    55L, 831L, 419L, 242L, 20L, 565L, 206L, 88L, 1817L, 531L, 286L, 
    31L, 19740L, 5930L, 613L, 468L, 167L, 360L, 232L, 124L, 909L, 
    161L, 18756L, 3642L, 549L)), class = "data.frame", row.names = c(NA, 
    -26L))