Search code examples
raverage

Data wrangling - calculate average of certain rows in a variable based on other column's information


I want to calculate the average of certain rows in a column based on another variable. dfin is the original df. I want to create a df like dfout

dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
                 c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
                 c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))

dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
                 c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
                 c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))

I want to calculate the average of rows in c3 based on c2 information. dfin has three columns c1, c2, and c3

c1 has a1, a2, a3, c2 has b1, b2, b3, to b9, and c3 comprises values.

As can be seen in dfout, I want to create a new df that has calculated average of values in c3 based on c2 groups. Also keeping c1 information.

Any help would be much appreciated.


Solution

  • Do you want this?

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
                       c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
                       c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
    
    dfin %>% 
      group_by(c1, c2) %>% 
      summarise(c3 = mean(c3), .groups = 'drop')
    #> # A tibble: 9 x 3
    #>   c1    c2       c3
    #>   <chr> <chr> <dbl>
    #> 1 a1    b1      1.5
    #> 2 a1    b2      3.5
    #> 3 a1    b3      5.5
    #> 4 a2    b4      7.5
    #> 5 a2    b5      9.5
    #> 6 a2    b6     11.5
    #> 7 a3    b7     13.5
    #> 8 a3    b8     15.5
    #> 9 a3    b9     17.5
    

    Created on 2022-01-19 by the reprex package (v2.0.1)