I want to calculate the average of certain rows in a column based on another variable. dfin
is the original df. I want to create a df like dfout
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))
I want to calculate the average of rows in c3 based on c2 information. dfin
has three columns c1
, c2
, and c3
c1 has a1, a2, a3
, c2 has b1, b2, b3, to b9
, and c3
comprises values.
As can be seen in dfout
, I want to create a new df that has calculated average of values in c3 based on c2 groups. Also keeping c1 information.
Any help would be much appreciated.
Do you want this?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfin %>%
group_by(c1, c2) %>%
summarise(c3 = mean(c3), .groups = 'drop')
#> # A tibble: 9 x 3
#> c1 c2 c3
#> <chr> <chr> <dbl>
#> 1 a1 b1 1.5
#> 2 a1 b2 3.5
#> 3 a1 b3 5.5
#> 4 a2 b4 7.5
#> 5 a2 b5 9.5
#> 6 a2 b6 11.5
#> 7 a3 b7 13.5
#> 8 a3 b8 15.5
#> 9 a3 b9 17.5
Created on 2022-01-19 by the reprex package (v2.0.1)