Search code examples
rtidyversereshape2

How to calculate new value based on sum of duplicate nominal values in R


I have two columns of data: one for a variable and one for the area that variable occurs in.

 veg_dominant  Shape_Area
        Hm1.1   28216.344
  Bp1.2molcae   6509.464
  Bp1.2molcae   43518.162
        Hm1.1   21348.608
        Hm1.1   14529.108
        Hm1.1   18050.676

I want to take the sum of the Shape_Area for all veg_dominant that are the same. For example, behind every Hm1.1 i want the sum of all the shape area's that are Bp1.2molcae, which would be 50027.626. So i want this number to appear behind both rows that contain Bp1.2molcae. The same goes for Hm1.1. A new dataframe with just every unique variable and the sum of the Shape_Area is what i want to go for in the end.

Expected output for the example above would be:

Veg_dominant      Shape_Area
Hm1.1             82144.736
Bp1.2molcae       50027.626

I have a lot of rows, but below is the code for just the head as shown in the example above.

structure(list(veg_dominant = structure(c(59L, 14L, 14L, 59L, 
59L, 59L), .Label = c("", "Bb1.1.1", "bebouwing", "Beuk", "bos", 
"Bp", "Bp1.1", "Bp1.1.1", "Bp1.1.3", "Bp1.1.3loof", "Bp1.1Calluna", 
"Bp1.2", "Bp1.2desflex", "Bp1.2molcae", "Bp1.3", "Bq11.1", "Bq3.2.5", 
"Bq4.1betpin", "Bq4.1querob", "Bq5.2pinbet", "Bq6.1desflex", 
"Bq6.1molcae", "Bq6.2", "Bq6.2molcae", "Bq9", "Bq99.1", "Bq99.2", 
"Bq99.3", "Dd1", "Dd2.1", "Dd2.2", "Dd3", "Dd5.1", "E00", "G01a", 
"G02", "G03", "G04", "G05", "G06", "G07", "Gc04", "Gc1", "Gc2", 
"Gc3", "Gc4", "grasland", "Grasland ", "H01", "H03", "Hc1", "Hc2", 
"Hc3", "Hc3_0", "Hc3_3", "Hc3Cp", "Hc3t", "hm1.1", "Hm1.1", "Hm1.1_3", 
"Hm1.2", "Hp1.1", "Hpc", "Hv", "jeneverbesstruweel", "Oefendorp", 
"open zand", "Open zand", "opslag", "Opslag", "P02", "Sj1.1", 
"weg", "x", "x00", "X00"), class = "factor"), Shape_Area = c(28216.3437, 
6509.46415, 43518.16186, 21348.60848, 14529.10796, 18050.6759
)), row.names = c(NA, 6L), class = "data.frame")

Solution

  • Here, we can just do a group by sum

    library(dplyr)   
    df1 %>% 
      group_by(veg_dominant) %>%
      summarise(Shape_Area = sum(Shape_Area))
    

    Or in base R

    aggregate(Shape_Area ~ veg_dominant, df1, sum)