Search code examples
rcountgroup-bydplyrsummarize

Cannot get sum per group in data tbl when using dplyr in R


I'm using dplyr to try to get means of 6 variables according to 3 groups, and I want to have the count of each cell as well(i.e., I want to add a column of counts for each group-variable pair)

my code is something like this:

bitul_reason_tbl <- bitul_reason_calc %>% group_by(segment_name) %>% summarize(Total_Count=n(),
                                                       better_insurance = mean(better_insurance),count1=sum(bitul_reason_calc$better_insurance),
                                                       blank = mean(blank), count2=sum(bitul_reason_calc$blank),
                                                       kefel = mean(kefel), count3=sum(bitul_reason_calc$kefel),
                                                       no_need = mean(no_need), count4=sum(bitul_reason_calc$no_need),
                                                       other = mean(other), count5=sum(bitul_reason_calc$other),
                                                       price = mean(price), count6=sum(bitul_reason_calc$price),
                                                       sherut = mean(sherut),count7=sum(bitul_reason_calc$sherut))

The variables are all 0s or 1s, so summing is like counting. But what I get instead is the total sum of each variable repeated 3 times and not the sum as it is supposed to be per group. What's wrong?

# A tibble: 3 x 14
        segment_name Total_Count      price count1      kefel count2     sherut count3   nothing count4      other count5     blank count6
              <fctr>       <int>      <dbl>  <dbl>      <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>
1         briut_siud         277 0.11552347     69 0.02527076     22 0.04693141     27 0.1227437    101 0.05776173     81 0.6498195    465
2 vetek_up_half_year         225 0.09333333     69 0.02666667     22 0.03111111     27 0.1288889    101 0.14222222     81 0.5866667    465
3             teunot         247 0.06477733     69 0.03643725     22 0.02834008     27 0.1538462    101 0.13360324     81 0.6194332    465

Solution

  • Ok so the solution that worked for me (strangely) is that I switched the order by which I call for sum() and mean() inside summarize(). This is weird, but it worked.