Search code examples
rstatisticsrowsmean

Calculate the mean of specific rows


I have a data that contains 10 lines (samples) that are either Class1 or Class2 and columns.

I want to calculate the mean of rows Class1 and make the result in line 11 and calculate the mean of rows Class2 and make the result in a line 12.

The data:

Name   ClassType   Col1   Col2   Col3
A      Class1       10     50    12
B      Class2        7     20    12
C      Class1        8     12     8
D      Class1        9     14    17
E      Class2        3     15    14
F      Class2       10     15    16
G      Class2       12     22    15
H      Class1       10     28    10

The result I want:

       Name   ClassType    Col1   Col2   Col3     
        A      Class1       10     50    12        
        B      Class2        7     20    12
        C      Class1        8     12     8
        D      Class1        9     14    17
        E      Class2        3     15    14
        F      Class2       10     15    16
        G      Class2       12     22    15
        H      Class1       10     28    10
 Mean class1    NA          9.25   26    11.75
 Mean class2    NA          8      18    14.25

Solution

  • Here is another tidyverse option, but doing everything in one pipe. We can first group by ClassType, then summarize over any columns that start with Col. Also, in the summarize statement, we can create the additional names that you want for the bottom rows (i.e., Mean Class 1 and Mean Class 2) and just return NA for ClassType. Then, we can use bind_rows to bind the original input dataframe to the new one (denoted by the .) from the previous step.

    df %>%
      group_by(ClassType) %>%
      summarize(Name = paste0("Mean ", unique(ClassType)),
                across(starts_with("Col"), mean),
                ClassType = NA) %>%
      bind_rows(df, .)
    

    Output

              Name ClassType  Col1 Col2  Col3
    1            A    Class1 10.00   50 12.00
    2            B    Class2  7.00   20 12.00
    3            C    Class1  8.00   12  8.00
    4            D    Class1  9.00   14 17.00
    5            E    Class2  3.00   15 14.00
    6            F    Class2 10.00   15 16.00
    7            G    Class2 12.00   22 15.00
    8            H    Class1 10.00   28 10.00
    9  Mean Class1      <NA>  9.25   26 11.75
    10 Mean Class2      <NA>  8.00   18 14.25
    

    Data

    df <- structure(
      list(
        Name = c("A", "B", "C", "D", "E", "F", "G", "H"),
        ClassType = c(
          "Class1",
          "Class2",
          "Class1",
          "Class1",
          "Class2",
          "Class2",
          "Class2",
          "Class1"
        ),
        Col1 = c(10L, 7L, 8L, 9L, 3L,
                 10L, 12L, 10L),
        Col2 = c(50L, 20L, 12L, 14L, 15L, 15L, 22L, 28L),
        Col3 = c(12L, 12L, 8L, 17L, 14L, 16L, 15L, 10L)
      ),
      class = "data.frame",
      row.names = c(NA,-8L)
    )