Search code examples
rdplyrtapply

How to tapply in dplyr and create a new column


I´m stuck with dplyr (again!) and trying to solve my problem without dying in the attemp.

The first lines of my df look like this:

df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990, 
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"), 
    dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF", 
    "PA"), class = "factor"), sector = c("EPb", "EPc", "EPc", 
    "EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb", 
    "EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88, 
    3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44, 
    3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037, 
    1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926, 
    1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037, 
    1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037, 
    1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596, 
    695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556, 
    1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596, 
    4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596, 
    4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")

I´m trying to "translate" this: sumsect <- tapply(md_peso * md_area, as.factor(substr(names(sector), 1, 2)), sum) into dplyr. But with no success although I´ve tried many many approaches. I added a column ("dem_sect") which will be the result of as.factor(substr(names(sector), 1, 2)) in an attempt to solve the problem, but I failed.

The desired output would be a data frame with a new column: "sumsect" (with the same value (in this case 6579.148 (the sum of md_peso * md_area by sector (1579.6562 + 4303.6530 + 695.8386))

    fecha  cientifico          dem_sect sector md_area md_peso  dummy  sumsect
1   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
2   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
3   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
4   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
5   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
6   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
7   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
8   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
9   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
10  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
11  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
12  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
13  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
14  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
15  1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148

Any hint will be more than welcome. Thanks in advance


Solution

  • Update: Seeing @Jahi Zamy answer+1 it is also possible using no grouping: Grouping would have the chance to control over different groups in the real data set:

    df %>% 
      mutate(sumsect = sum(unique( md_peso * md_area)))
    

    First answer: You can do it this way with dplyr: The trick is using group_by and then ungroup() and sum with unique values. In case you want to sum for specific groups, then instead of ungroup use group_by the desired group:

    df %>% 
      group_by(sector) %>% 
      mutate(y = md_peso * md_area) %>% 
      ungroup() %>% 
      mutate(sumsect = sum(unique(y)), .keep="unused")
    
       fecha cientifico          dem_sect sector md_area md_peso dummy sumsect
       <dbl> <fct>               <fct>    <chr>    <dbl>   <dbl> <dbl>   <dbl>
     1  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
     2  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
     3  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
     4  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
     5  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
     6  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
     7  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
     8  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
     9  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    10  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    11  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    12  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    13  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    14  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
    15  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.